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Preface 


Intended Audience 


This manual is written for the customer service engineer. 


Document Structure 


This manual uses a structured documentation design. Topics are organized into 
small sections for efficient online and printed reference. Each topic begins with an 
abastract, followed by an illustration or example, and ends with descriptive text. 


This manual has seven chapters and three appendixes, as follows: 


Chapter 1, System Overview, introduces the DIGITAL AlphaServer 4000/4100 
pedestal and cabinet systems and gives an overview of the system bus modules. 


Chapter 2, Power-Up, provides information on how to interpret the power-up 
display on the operator control panel, the console screen, and system LEDs. It 
also describes how hardware diagnostics execute when the system is initialized. 


Chapter 3, Troubleshooting, describes troubleshooting during power-up and 
booting, as well as the test command. 


Chapter 4, Power System, describes the AlphaServer 4000/4100 power system. 


Chapter 5, Error Logs, explains how to interpret error logs and how to use 
DECevent. 


Chapter 6, Error Registers, describes the error registers used to hold error 
information. 


Chapter 7, Removal and Replacement, describes removal and replacement 
procedures for field-replaceable units (FRUs). 


Appendix A, Running Utilities, explains how to run utilities such as the EISA 
Configuration Utility and RAID Standalone Configuration Utility. 


xi 


e Appendix B, SRM Console Commands and Environment Variables, 
summarizes the commands used to examine and alter the system configuration. 


e Appendix C, Operating the System Remotely, describes how to use the remote 


console monitor (RCM) to monitor and control the system remotely. 


Documentation Titles 


Table 1 lists titles related to AlphaServer 4000/4100 systems. 


Table 1 AlphaServer 4000/4100 Documentation 


Tite Order Number 
ApphaServer 4100 User and Configuration QZ-00V AA-GZ 
Documentation Kit 

System Drawer User’s Guide EK-4100A-UG 

Configuration and Installation Guide EK-4100A-—CG 
ApphaServer 4000 User and Configuration QZ-00V AB-GZ 
Documentation Kit 

System Drawer User’s Guide EK—4000A—UG 

Configuration and Installation Guide EK-—4100A—CG 
Service Manual (hard copy) EK-4100A-SV 
Service Manual (diskette) AK-QXBJB-CA 
System Drawer Upgrades EK-4041A-UI 
PCI Upgrade EK-4000A-UI 
KN30n CPU Installation Card EK-—KN300-IN 
MS3n0 Memory Installation Card EK-—MS300-IN 
H7291 Power Supply Installation Card EK-H7291-IN 


ServerWORKS Manager Administrator User’s Guide 


ER-4QXAA-UA 


xii 


Information on the Intemet 


Using a Web browser you can access the AlphaServer InfoCenter at: 


http://www.digital.com/Anfo/alphaserver/products.html 


Access the latest system firmware either with a Web browser or via FTP as follows: 


ftp://ftp.digital.com/pub/Digital/Alpha/firmware/ 
Interim firmware released since the last firmware CD is located at: 


ftp://ftp.digital.com/pub/Digital/Alpha/firmware/interim/ 
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Chapter 1 
System Overview 


This chapter introduces the DIGITAL AlphaServer 4000 and the DIGITAL 
AlphaServer 4100 systems. These systems are available in cabinets or pedestals. 


There are three system drawers; two, the BA30B and the BA30C, are used in the 
AlphaServer 4000, and the third, the BA30A, is used in the AlphaServer 4100. 


The pedestal system has one system drawer and up to three StorageWorks shelves. 
The cabinet system can have a combination of system drawers and StorageWorks 
shelves that occupy the five sections of the cabinet. 


Topics in this chapter include the following: 

e = AlphaServer 4100 System Drawer (BA30A) 
e = AlphaServer 4000 System Drawer (BA30C) 
e = AlphaServer 4000 System Drawer (BA30B) 
e =Cabinet System 

e = Pedestal System 

e Control Panel and Drives 

e System Consoles 

e System Architecture 

e System Motherboard 

e CPU Types 

e Memory Modules 

e Memory Addressing 

e = System Bus 

e System Bus to PCI Bus Bridge Module 

e = =PCII/O Subsystem 

e = Server Control Module 

e Power Control Module 


e Power Supply 
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1.1 AlphaServer 4100 System Drawer (BA30A) 


Components in the BA30A system drawer are located in the system bus card 
cage, the PCI card cage, the control panel assembly, and the power and cooling 
section. The drawer measures 30 cm x 45 cm (11.8 in. x 17.7 in.) and fully 
configured weighs approximately 45.5 kg (~100 Ibs). 


Figure 1-1 Components of the BA30A System Drawer 


PK-0702-96 


When the system drawer is in a pedestal, the control panel assembly is mounted in a 
tray at the top of the drawer. 


The numbered callouts in Figure 1-1 refer to components of the system drawer. 
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System card cage, which holds the system motherboard and the CPU, memory, 
bridge, and power control modules. (The difference between the BA30A and the 
BA30C is the system motherboard.) 


PCI/EISA card cage, which holds the PCI motherboard, option cards, and server 
control module. 


Server control module, which holds the I/O connectors and remote console 
monitor. 


Control panel assembly, which includes the control panel, a floppy drive, and a 
CD-ROM drive. 


Power and cooling section, which contains one to three power supplies and fans. 


o 6 © ® 


Cover Interlocks 


The system drawer has three cover interlocks: one for the system bus card cage, one 
for the PCI card cage, and one for the power and system fan area. 


Figure 1-2 Cover interlock Circuit 
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NOTE: The cover interlocks must be engaged to enable power-up. 


To override the cover interlocks, find a suitable object to close the interlock circuit. 
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1.2 AlphaServer 4000 System Drawer (BA30C ) 


Components in the BA30C system drawer are located in the system bus card 
cage, PCI card cage, control panel assembly, and power and cooling section. 
The drawer measures 30 cm x 45 cm (11.8 in. x 17.7 in.) and fully configured 
weighs approximately 45.5 kg (~100 Ibs). 


Figure 1-3 Components of the BA30C System Drawer 
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When the system drawer is in a pedestal, the control panel assembly is mounted in a 
tray at the top of the drawer. 


The numbered callouts in Figure 1-3 refer to components of the system drawer. 
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System card cage, which holds the system motherboard and the CPU, memory, 
bridge, and power control modules. (The difference between the BA30A and the 
BA30C is the system motherboard.) 


PCI/EISA card cage, which holds the PCI motherboard, option cards, and server 
control module. 


Server control module, which holds the I/O connectors and remote console 
monitor. 


Control panel assembly, which includes the control panel, a floppy drive, and a 
CD-ROM drive. 


Power and cooling section, which contains one to three power supplies and fans. 
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Cover Interlocks 


The system drawer has three cover interlocks: one for the system bus card cage, one 
for the PCI card cage, and one for the power and system fan area. 


Figure 1-4 Cover Interlock Circuit 
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NOTE: The cover interlocks must be engaged to enable power-up. 


To override the cover interlocks, find a suitable object to close the interlock circuit. 
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13 AlphaServer 4000 System Drawer (BA30B) 


Components in the BA30B system drawer are located in the system bus card 
cage, two PCI card cages, the control panel assembly, and the power and cooling 
section. The drawer measures 30 cm x 45 cm (11.8 in. x 17.7 in.) and fully 
configured weighs approximately 45.5 kg (~100 Ibs). 


Figure 1-5 Components of the BA30B System Drawer 
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When the system drawer is in a pedestal, the control panel assembly is mounted in a 
tray at the top of the drawer. 


The numbered callouts in Figure 1-5 refer to components of the system drawer. 
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System card cage holds the system motherboard, the CPU, memory, bridge, and 
power control modules. 


PCI/EISA card cage holds the PCI/EISA motherboard for PCI/EISA 0 and PCI 
1, option cards, and server control module. 


Server control module holds the I/O connectors and remote console. 

Control panel assembly holds the control panel, a floppy, and a CD-ROM. 
Power and cooling section contains one to three power supplies and three fans. 
PCI card cage holds the PCI motherboard for PCI 2 and PCI 3. 


eooeQ © © 


Cover Interlocks 


The system drawer has four cover interlocks: one for each section of the drawer. 


Figure 1-6 Cover Interlock Circuit 
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1.4 Cabinet System 


The AlphaServer 4000/4100 cabinet system can accommodate multiple systems 
in a single cabinet. There are four cabinet variations that can hold different 
system configurations. Diferences are in power distribution and drawer 
mounting; from the outside the cabinets look almost identical. 


Figure 1-7 AlphaServer 4000/4100 Cabinet System 
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Cabinet Differences 


Cabinet Power Mounting Destination 

H9A10-EB AC input box C channel North America 
power strips (max drawers: 4) Asia Pacific 

H9A10-EC AC input box C channel Europe 
power strips (max drawers: 4) 

H9A10-EL Two 120 volt Pull-out tray North America 
H7600-AA power (max drawers: 3) Asia Pacific 
controllers 

H9A10-EM — Two 240 volt Pull-out tray Europe 
H7600-DB power (max drawers: 3) 
controllers 

Cabinet System Fan Tray 


At the top of cabinet systems is a fan tray containing three exhaust fans, a small 12- 
volt power supply, and a module that distributes power to the server control module 
in each drawer. 


Figure 1-8 Cabinet Fan Tray 
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1.5 Pedestal System 


The pedestal system contains one system drawer with a control panel, a CD- 
ROM drive, and a floppy drive. In the pedestal control panel area there is space 
for an optional tape or disk drive. Three StorageWorks shelves provide up to 90 
Gbytes of in-cabinet storage. 


Figure 1-9 Pedestal System Front 


In the pedestal system, the control panel is located at the top left in a tray. See 
Figure 1-11. There is space for an optional device beside it. 
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Figure 1-10 Pedestal System Rear 
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1.6 Contol Panel and Drives 


The control panel includes the On/Off, Halt, and Reset buttons and a display. In 
a pedestal system the control panel is located in a tray at the top of the system 
drawer. In a cabinet system it is at the bottom of the system drawer with the 
CD-ROM drive and the floppy drive. 


Figure 1-11 Control Panel Assembly 
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@ On/Off button. Powers the system drawer on or off. When the LED at the top 
of the button is lit, the power is on. The On/Off button is connected to the 
power supplies and the system interlocks. 
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NOTE: The LEDs on some modules are on when the line cord is plugged in, 
regardless of the position of the On/Off button. 


Halt button. Pressing this button in (so the LED at the top of the button is on) 
does the following: 


If DIGITAL UNIX or OpenVMS is running, halts the operating system and 
returns to the SRM console. The Halt button has no effect on Windows NT. 


If the Halt button is in when the system is reset or powered up, the system halts 
in the SRM console, regardless of the operating system. DIGITAL UNIX and 
OpenVMS systems that are configured for autoboot will not boot if the Halt 
button is in. Windows NT systems halt in the SRM console; AlphaBIOS is not 
loaded and started. 


If you press the Halt button in (LED on) and do not issue commands that 
disturb the system state, entering the continue command returns the system to 
the operating system it was running. To return to console mode again, press 
the Halt button (LED off) and then press it again (LED on). 


If the system is hung, pressing the Halt button (LED on) usually brings up the 
SRM console. Enter the crash command to do a crash dump. If pressing the 
Halt button does not bring up the SRM console, there is probably a hardware 

fault that is not allowing the halt signal to pass from the XBUS to the CPU. 


Reset button. Initializes the system drawer. If the Halt button is pressed (LED 
on) when the system is reset, the SRM console is loaded and remains in the 
system regardless of any other conditions. 


Control panel display. Indicates status during power-up and self-test. The 
OCP display is a 16-character LCD. Its controller is on the XBUS on the PCI 


motherboard. 


While the operating system is running, displays the system type as a default. 
This message can be changed by the user. 


CD-ROM drive. The CD-ROM drive is used to load software, firmware, and 
updates. Its controller is on PCI1 on the PCI motherboard. 


Floppy disk drive. The floppy drive is used to load software and firmware 
updates. The floppy controller is on the XBUS on the PCI motherboard. 
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1.7 System Consoles 


There are two console programs: the SRM console and the AlphaBIOS console. 


SRM Console Prompt 


On systems running the DIGITAL UNIX or OpenVMS operating system, the 
following console prompt is displayed after system startup messages are displayed, 
or whenever the SRM console is invoked: 


POQ>>> 


NOTE: The console prompt displays only after the entire power-up sequence is 
complete. This can take up to several minutes if the memory is very large. 


AlphaBIOS Boot Menu 


On systems running the Windows NT operating system, the Boot menu is displayed 
when the AlphaBIOS console is invoked: 


AlphaBIOS Version 5.12 


Please select the operating system to start: 


Windows NT Server 3.51 


Use | and t to move the highlight to your choice. 
Press Enter to choose. 


umm A/pha 


Press <F2> to enter SETUP 
PK-0728-96 
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SRM Console 


The SRM console is a command-line interface that is used to boot the DIGITAL 
UNIX and OpenVMS operating systems. It also provides support for examining and 
modifying the system state and configuring and testing the system. The SRM 
console can be run from a serial terminal or a graphics monitor. 


AlphaBIOS Console 


The AlphaBIOS console is a menu-based interface that supports the Microsoft 
Windows NT operating system. AlphaBIOS is used to set up operating system 
selections, boot Windows NT, and display information about the system 
configuration. The EISA Configuration Utility and the RAID Standalone 
Configuration Utility are run from the AlphaBIOS console. AlphaBIOS runs on 
either a serial or graphics terminal, but Windows NT requires a graphics monitor. 


Environment Variables 


Environment variables are software parameters that define, among other things, the 
system configuration. They are used to pass information to different pieces of 
software running in the system at various times. The os_type environment variable, 
which can be set to VMS, UNIX, or NT, determines which of the two consoles is to 
be used. The SRM console is always brought into memory, but AlphaBIOS is 
loaded if os_type is set to NT and the Halt button is out (not lit). 


Refer to Appendix B of this guide for a list of the environment variables used to 
configure AlphaServer 4000 and 4100 systems. 


Refer to the AlphaServer 4x00 System Drawer User’s Guide for information on 
setting environment variables. 


It is recommended that you keep a record of the environment variables for each 
system that you service. Some environment variable settings are lost when a module 
is swapped and must be restored after the new module is installed. Refer to 
Appendix B for a convenient worksheet for recording environment variable settings. 
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L8 System Architecture 


Alpha microprocessor chips are used in these systems. The CPU, memory, and 
the I/O bridge module(s) are connected to the system bus motherboard. 


Figure 1-12 Architecture Diagram 
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AlphaServer 4000/4100 systems use the Alpha chip for the CPU. The CPU, 
memory, and I/O bridge modules, one to PCI/EISA I/O buses and another (4000 
only) to another pair of PCIs, are connected to the system bus motherboard. A 
fourth type of module, the power control module, also plugs into the system 
motherboard. A fully configured 4100 system drawer can have up to four CPUs, 
four memory pairs, and a total of eight I/O options. The I/O options can be all PCI 
options or a combination of PCI options and EISA options, but there can be no more 
than three EISA options. A fully configured 4000 system drawer can have up to two 
CPUs, two memory pairs, and a total of sixteen I/O options. The I/O options can be 
all PCI options or a combination of PCI options and EISA options, but there can be 
no more than three EISA options. 


The system bus has a 144-bit data bus protected by 16 bits of ECC and a 40-bit 
command/address bus protected by parity. The bus speed depends on the speed of 
the CPU in slot 0 which provides the clock for the buses. The 40-bit address bus can 
create one terabyte of addresses (that’s a million billion). The bus connects CPUs, 
memory, and the system bus to PCI bus bridge(s). 


The CPU modules are available with and without an external cache. The Alpha chip 
has an 8-Kbyte instruction cache (I-cache), an 8-Kbyte write-through data cache (D- 
cache), and a 96-Kbyte, write-back secondary data cache (S-cache). Some variants 
of the CPU module include an onboard cache. The cache system is write-back. The 
system drawer supports up to four CPUs. 


The memory modules are placed on the system motherboard in pairs. Each module 
drives half of the system bus, along with the associated ECC bits. Memory pairs 
consist of two modules that are the same size and type. Two types are available: 
synchronous and asynchronous (EDO) memory. 


The system bus to PCI bus bridge module translates system bus commands and data 
addressed to I/O space to PCI commands and data. It also translates PCI bus 
commands and data addressed to system memory or CPUs to system bus commands 
and data. The PCI bus is a 64-bit wide bus used for I/O. Both the 4100 and the 4000 
have one PCI/EISA card cage, and the 4000 may contain a second PCI card cage. 


The power control module, which is on the system motherboard, monitors power and 
the system environment. 
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19 System Motherboard 


The system motherboard is on the floor of the system card cage. It has slots for 
the CPU, memory, power control, and bridge modules. 


Figure 1-13 System Motherboard Module Locations 


4100 Motherboard (54-23803-01) 


od 


fe) 
° 


: | 0 
3 : al 69 
F peo 
| i 
Oe j @ 
: Te? 
=e 
: —_is— © 
4000 Motherboard (54-23803-02) 4000 Motherboard (54-23805-01) 
9 o__ —_o— "7 ° — ° = 3) 
o — ° oO (2) 
} oO [ ] [ jo | _____ 
0 
l [ | | 
° | ol .3) 


fo} 


ut 


i 
®@ 06 06 Od 
8 


ee} 
lo 


PKW0440J-96 


1-18 AlphaServer 4000/4100 Service Manual 


The system motherboard has the logic for the system bus. It is the backplane that 
holds the CPU, memory, bridge, and power control modules. Figure 1-13 shows 
diagrams of the three motherboards used in AlphaServer 4000/4100 systems. The 
module locations are designated by the callouts. 


CPU module 
Memory module 


Bridge module 


oOe@06 


Power control module 
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1.10 CPU Types 


AlphaServer 4000 and 4100 systems can be configured with one of several CPU 
variants. Variants are differentiated by CPU speeds and the presence or 
absence of a backup data cache external to the Alpha microprocessor chip. 


Figure 1-14 CPU Module Layout 
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Alpha Chip Composition 


The Alpha chip is made using state-of-the-art chip technology, has a transistor count 
of 9.3 million, consumes 50 watts of power, and is air cooled (a fan is on the chip). 
The default cache system is write-back and when the module has an external cache, 
it is write-back. 
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Chip Description 


Unit Description 

Instruction 8-byte cache, 4-way issue 

Execution 4-way execution; 2 integer units, 1 floating-point adder, 
1 floating-point multiplier 

Memory Merge logic, 8-Kbyte write-through first-level data cache, 
96-Kbyte write-back second-level data cache, bus 

_ interface unit 
CPU Variants 

Module Variant Clock Frequency Onboard Cache 

B3001-CA ~ 300 MHz "None 

B3002-AB 300 MHz 2 Mbytes 

B3004-BA 300 MHz 2 Mbytes 

B3004-AA 400 MHz 4 Mbytes 

B3004-DA _466 MHz _4 Mbytes 


CPU Configuration Rules 
e The first CPU must be in CPU slot 0 to provide the system clock. 


e Additional CPU modules should be installed in ascending order by slot number. 


e All CPUs must have the same Alpha chip clock speed. The system bus will 


hang without an error message if the oscillators clocking the CPUs are different. 


e Mixing of cached and uncached CPUs is not supported. 


Color Codes 


The top edge of the CPU module variant is color coded for easy identification. 


Option 
Color Number Description 
Dark Blue B3001-CA 300 MHz, uncached 
Green B3002-AB 300 MHz, 2MB cached 
Green B3004-BA 300 MHz, 2MB cached 
Orange B3004-AA 400 MHz, 4MB cached 
Red _B3004-DA _ 466 MHz, 4MB cached 
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1.11 Memory Modules 


Memory modules are used only in pairs — two modules of the same size and 
type. Each module provides either the low half or the high half of the memory 
space. The 4100 system drawer can hold up to four memory module pairs. The 
4000 system drawer can hold up to two memory module pairs. 


Figure 1-15 Memory Module Layout 
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Memory Variants 


Each memory option consists of two identical modules. Each 4100 drawer supports 
up to four memory options, for a total of 4 Gbytes of memory: 4000 drawers support 
half that. Memory modules are used only in pairs and are available in 128 Mbyte, 
512 Mbyte, and | Gbyte sizes. The 128-Mbyte option is synchronous memory, 
while the larger sizes are asynchronous memory (EDO). 


DRAM 
Option Size Module Type Number Size 
MS320-CA 128MB_ B3020-CA Synch. 36 4MBx4 
MS330-EA 512MB_ B3030-EA  Asynch. (EDO) 144 4 MB x4 
MS330-FA 1 GB B3030-FA = Asynch. (EDO) 72 16 MB x4 
MS330-GA__2 GB _B3030-GA___Asynch. (EDO) _ 144. 16 MBx4 


Memory Operation 


Memory modules are used only in pairs; each module provides half the data, or 64 
bits plus 8 ECC bits, of the octaword (16 byte) transferred on the system bus. 
Modules are placed in slots designated MEMxL and MEMxH. 


NOTE: Modules in slots MEMxL do not drive the lower 8 bytes, and modules in 
Slots MEMxH do not drive the higher 8 bytes of the 16 byte transfer. 


Unless otherwise programmed, memory drives the system bus in bursts. Upon each 
memory fetch, data is transferred in 4 consecutive cycles transferring 64 bytes. 
There are situations, however, when memories made with EDO DRAMs cannot 
provide data fast enough to complete the system bus transactions. When these 
situations arise, EDO type memories assert a signal that causes the system bus to 
stall for one (occasionally more) clock tick. When memory completes such an 
operation, it releases the system bus. 


Memory Configuration Rules 


In a system, memories of different sizes and types are permitted, but: 


e Memory modules are installed and used in pairs. Both modules in a memory 
pair must be of the same size and type. 


e The largest memory pair must be in slots MEM OL and MEM OH. 
e Other memory pairs must be the same size or smaller than the first memory pair. 


e Memory pairs must be installed in consecutive slots. 
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1.12 Memory Addressing 


Alpha system memory addressing is unusual because memory address space is 
determined not by the amount of physical memory but is calculated by a 
multiple of the size of the memory pair in slot MEMOx. 


Figure 1-16 How Memory Addressing Is Calculated 
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The rules for addressing memory are as follows: 
1. Address space is determined by the memory pair in slot MEMO. 
2. Memory pairs need not be the same size. 


3. The memory pair in slot MEMO must be the largest of all memory pairs. Other 
memory pairs may be as large but none may be larger. 


4. The starting address of each memory pair is N times the size of the memory pair 
in slot MEMO. N=0,1,2,3. 


5. Memory addresses are contiguous within each module pair. 


6. If memory pairs are of different sizes, memory “holes” can occur in the physical 
address space. See Figure 1-16. 


7. Software creates contiguous virtual memory even though physical memory may 
not be contiguous. 
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1.13 System Bus 


The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC 
data bus, and several control signals and clocks. 


Figure 1-17 System Bus Block Diagram 
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The system bus motherboard consists of a 40-bit command/address bus, a 128-bit 
plus ECC data bus, and several control signals, clocks, and a bus arbiter. The bus 
requires that all CPUs have the same high-speed oscillator providing the clock to the 
Alpha chip. 


The AlphaServer 4100 system bus connects up to four CPUs, four pairs of memory 
modules, and a single I/O bus bridge module. Note that the I/O bus bridges may be 
desinated as IODn where n is the number of the PCI bus. The first bridge is 
designated IODO and IOD1. 


The AlphaServer 4000 system bus connects up to two CPUs, two pairs of memory 
modules, and two I/O bus bridge modules. The second bridge on the 4000 system 
bus is designated IOD2 and IOD3. 


The system bus clock is provided by an oscillator on the CPU in slot CPUO. This 
oscillator has a 1:5 ratio to the Alpha chip. With 300 MHz CPUs, for example, the 
system bus operates at 60 MHz. 


The system bus motherboard initiates memory refresh transactions. The 
motherboard sits at the bottom of the system drawer, and in addition to CPUs, 
memory, and I/O bridges, holds a power control module. 


5 volt and 3.43 volt power is provided directly to the motherboard from the power 
supplies. 
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1.14 System Bus to PCI Bus Bidge Module 


The bridge module is the physical interconnect between the system motherboard 
and any PCI motherboard in the system. 


Figure 1-18 Bridge Module 
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The system bus to PCI bus bridge module converts system bus commands and data 
addressed to I/O space to PCI commands and data; and converts PCI bus commands 
and data addressed to system memory or CPUs to system bus commands and data. 
An AlphaServer 4100 system has one bridge module; an AlphaServer 4000 system 
can have a second bridge module. 


The bridge has two major components: 
e Command/address processor (CAP) chip 
e Two data path chips (MDPA and MDPB) 


There are two sets of these three chips, one set on each side of the module. Each set 
bridges to one of the PCI buses on the PCI motherboard. 


The interface on the system bus side of the bridge responds to system bus commands 
addressed to the upper 64 Gbytes of I/O space. I/O space is addressed whenever bit 
<39> on the system bus address lines is set. The space so defined is 512 Gbytes in 
size. The first 448 Gbytes are reserved and the last 64 Gbytes, when bits <38:36> 
are set, are mapped to the PCI I/O buses. 


The interface on the PCI side of the bridge responds to commands addressed to 
CPUs and memory on the system bus. On the PCI side, the bridge provides the 
interface to the PCIs. Each PCI bus is addressed separately. The bridge does not 
respond to devices communicating with each other on the same PCI bus. However, 
should a device on one PCI address a device on the other PCI bus, commands, 
addresses, and data run through the bridge out onto the system bus and back through 
the bridge to the other PCI bus. 


In addition to its bridge function, the system bus to PCI bus bridge module monitors 
every transaction on the system bus for errors. It monitors the data lines for ECC 
errors and the command/address lines for parity errors. 


NOTE: When errors are logged, the two bridge modules on the AlphaServer 4000 
are differentiated in the error log by their engineering code names, the left hand 
horse and the right hand horse. The left hand horse is the B3040-AA module; it is in 
the left most slot on the system bus motherboard when seen from the rear of the 
drawer. The right hand horse is the B3040-AB module, and it is in the right most 
slot on the system bus motherboard when seen from the rear of the drawer. 
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1.15 PCI1/O Subsystem 


The I/O subsystem is PCI. Both the 4100 and the 4000 have two four-slot PCI 
buses that hold up to eight I/O options. One of these buses can be both PCI and 
EISA but can hold not more than four options three of which may be EISA. The 
4000 can have an additional two four-slot PCI buses allowing a total of sixteen 


I/O options. 


Figure 1-19 PCI Block Diagram 
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Table 1-1 PCI Motherboard Sot Numbering 


“PCB 


Slot PCIO PCI1 pe only) (4000 only) 

0 Reserved Reserved Reserved Reserved 

1 PCI to EISA Internal CD-ROM _ Reserved Reserved 
bridge controller 

2 PCI or EISA slot PCI slot PCI slot PCI slot 

3 PCI or EISA slot PCI slot PCI slot PCI slot 

4 PCI or EISA slot PCI slot PCI slot PCI slot 

5 


PCI slot PCI slot PCI slot PCI slot 


The logic for two PCI buses is on each PCI motherboard. 


e  PCIO is a 64-bit bus with a built-in PCI to EISA bus bridge. PCIO has one 
dedicated PCI slot and three slots, though there are six connectors, that can be 
PCI or EISA slots. Each slot has an EISA connector and a PCI connector only 
one of which may be used at a time. PCIO is powered by 5V. 


e §=6PCII is a 64-bit bus with a built-in CD-ROM controller and four PCI slots. 
PCI] is powered by 5V. 


e PCI2 (4000 only) is a 64-bit four-slot PCI bus powered by both 3V and 5V. 
e PCI3 (4000 only) is a 64-bit four-slot PCI bus powered by both 3V and 5V. 


The B3050-AA PCI motherboard has cable connections to remote I/O (mouse, 
keyboard, serial port, and parallel port), an internal floppy drive, an internal CD- 
ROM drive, the control panel, and 5V power. Also on this module are the chips for 
the PCI to EISA bridge and the internal CD-ROM controller. This module is the 
motherboard for the PCI card cage on the left side of the system drawer. 


An 8-bit XBUS is connected to the EISA bus. On this bus there is an interface to the 
system I°C bus; mouse and keyboard support; an I/O combo controller supporting 
two serial ports, the floppy controller, and a parallel port; a real-time clock; two 1- 
Mbyte flash ROMs containing system firmware, and an 8-Kbyte NVRAM. 


The B3050-AB PCI motherboard, used only in the AlphaServer 4000, contains two 
four-slot 64-bit PCI buses. 
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1.16 Server Control Module 


The server control module enables remote console connections to the system 
drawer. The module passes signals to COM ports 1 and 2, the keyboard, and 
the mouse to the standard I/O connectors. 


Figure 1-20 ServerContol Module 
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The server control module has two sections: the remote console monitor (RCM) and 
the standard I/O. See Appendix C for information on controlling the system 
remotely. 


The remote console monitor connects to a modem through the modem port on the 
bulkhead. The RCM requires a 12V power connection. 


The standard I/O ports (keyboard, mouse, COM1 and COM2 serial, and parallel 
ports) are on the same bulkhead. 
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1.17 Power Contol Module 


The power control module controls power sequencing and monitors power 
supply voltage, temperature, and fans. 


Figure 1-21 Power Control Module 
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The power control module performs these functions: 
e Controls power sequencing. 


e Monitors the combined output of power supplies and shuts down power if it is 
not in range. 


e Monitors system temperature and shuts off power if it is out of range. 


e¢ Monitors the fans in the system drawer and on the CPU modules and shuts down 
power if a fan fails. 


e Provides visual indication of faults through LEDs. 
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1.18 Power Supply 


The system drawer power supplies provide power only to components in the 
drawer. One or two power supplies are required, depending on the number of 
CPU modules and PCI card cages; a second or third can be added for 
redundancy. The power system is described in detail in Chapter 4. 


Figure 1-22 Location of Power Supply 


Power Supply 2 
Power Supply 1 
Power Supply 0 


PK-0715-96 


1-36 AlphaServer 4000/4100 Service Manual 


Description 


One to three power supplies provide power to components in the system drawer. 
(They supply power only for the drawer in which they are located.) Three power 
supplies provide redundant power in fully loaded AlphaServer 4000/4100 systems. 


These power supplies share the load, and redundant configurations are supported. 
They autoselect line voltage (120V to 240V). Each has 450 W output and supplies 
up to 75A of 3.43V, 50A of 5.0V, 11A of 12V, and small amounts of —5V, -12V, 
and auxiliary voltage (Vaux). 


NOTE: The LEDs on some modules are on when the line cord is plugged in, 


regardless of the position of the On/Off button. 


Configuration 


An AlphaServer 4100 system with one or two CPUs requires one power supply 
(two for redundancy). 


An AlphaServer 4100 system with three or four CPUs requires two power 
supplies (three for redundancy). 


An AlphaServer 4000 system with one or two CPUs and one PCI card cage 
requires one power supply (two for redundancy). 


An AlphaServer 4000 system with one or two CPUs and two PCI card cages 
requires two power supplies (three for redundancy). 


Power supply 0 is installed first, power supply 2 second, and power supply 1 
third. See Figure 1-22. (The power supply numbering shown here corresponds to 
the numbering displayed by the SRM console's show power command. ) 
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Chapter 2 
Power Up 


This chapter describes system power-up testing and explains the power-up displays. 
The following topics are covered: 


Control Panel 

Power-Up Sequence 

SROM Power-Up Test Flow 
SROM Errors Reported 
XSROM Power-Up Test Flow 
XSROM Errors Reported 
Console Power-Up Tests 
Console Device Determination 
Console Power-Up Display 
Fail-Safe Loader 


PowerUp 2-1 


2.1 Control Panel 


The control panel display indicates the likely device when testing fails. 


Figure 2-1 Contol Panel and LCD Display 
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e When the On/Off button LED is on, power is applied and the system is running. 
When it is off, the system is not running, but power may or may not be present. 
If power is present, the PCM or the power LED on the system bus to PCI bus 
bridge module should be flashing. Otherwise, there is a power problem. 


e When the Halt button LED is lit and the On/Off button is on, the system should 
be running either the SRM console or Windows NT. If the Halt button is in, but 
the LED is off, the OCP, its cables, or the PCM is likely to be broken. 
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Table 2-1 Control Panel Display 


Feld Content Display Meaning 
1) CPU number POQ-P3 CPU reporting status 
(2) Status TEST Tests are executing 
FAIL Failure has been detected 
MCHK Machine check has occurred 
INTR Error interrupt has occurred 
(3) Test number 
4) Suspected device CPUO0-3 CPU module number' 
MEMO-3 and Memory pair number and low 
L, H, or * module, high module, or either’ 
IODO Bridge to PCI bus 0° 
IOD1 Bridge to PCI bus 1° 
IOD2 Bridge to PCI bus 2* 
IOD3 Bridge to PCI bus 3° 
FROMO Flash ROM” 
COMBO COM controller” 
PCEB PCI-to-EISA bridge” 
ESC EISA system controller” 
NVRAM Nonvolatile RAM” 
TOY Real-time clock” 
Keyboard and mouse controller’ 


18242 


The potentiometer, accessible through the access hole just above the Reset button 
controls the intensity of the LCD. Use a small Phillips head screwdriver to adjust. 


‘CPU module 


> Memory module 


* Bridge module (B3040-AA) 


“ Bridge module (B3040-AB) 
* EISA/PCI motherboard 
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2.2 Power-Up Sequence 


Console and most power-up tests reside on the I/O subsystem, not on the CPU 


nor on any other module on the system bus. 


Figure 2-2 Power-Up How 
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SROM. The SROM is a 128-Kbit ROM on each CPU module. The ROM contains 
minimal diagnostics that test the Alpha chip and the path to the XSROM. Once the 
path is verified, it loads XSROM code into the Alpha chip and jumps to it. 


XSROM. The XSROM, or extended SROM, contains back-up cache and memory 
tests, and a fail-safe loader. The XSROM code resides in sector 0 of FEPROM 0 on 
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the XBUS. Sector 2 of FEPROM 0 contains a duplicate copy of the code and is used 
if sector 0 is bad. 


FEPROM. Two 1-Mbyte programmable ROMs are on the XBUS on PCIO. 
FEPROM 0 contains two copies of the XSROM, the OpenVMS and DIGITAL UNIX 
PALcode, and the SRM console and decompression code. FEPROM | contains the 
AlphaBIOS and NT HALcode. See Figure 2-3. These two FEPROMs can be flash 
updated. Refer to Appendix A. 


Figure 2-3 Contents of FEPROMs 
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For the console to run, the path from the CPU to the XSROM must be functional. 


The XSROM resides in FEPROM! 


0 on the XBUS, off the EISA bus, off PCI 0, off 


IOD 0. See Figure 2-4. This path is minimally tested by SROM. 


Figure 2-4 Console Code Critical Path (4100 Block Diagram) 
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The SROM contents are loaded into each CPU’s I-cache and executed on power- 
up/reset. After testing the caches on each processor chip, it tests the path to the 
XSROM. Once this path is tested and deemed reliable, layers of the XSROM are 
loaded sequentially into the processor chip on each CPU. None of the SROM or 
XSROM power-up tests are run from memory —all run from the caches in the CPU 
chip, thus providing excellent diagnostic isolation. Later power-up tests, run under 
the console, are used to complete testing of the I/O subsystem. 


There are two console programs: the SRM console and the AlphaBIOS console, as 
detailed in the AlphaServer 4100 System Drawer User’s Guide (EK-4100A-UG) and 
the AlphaServer 4000 System Drawer User’s Guide (EK-4000A-UG). By default, 
the SRM console is always loaded and I/O system tests are run under it before the 
system loads AlphaBIOS. To load AlphaBIOS, the os_type environment variable 
must be set to NT and the Halt button should be out (LED not lit). Otherwise, the 
SRM console continues to run. 
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2.3 SROM Power-Up Test How 


The SROM tests the CPU chip and the path to the XSROM. 


Figure 2-5 SROM Power-Up Test How 
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The Alpha chip built-in self-test tests the I-cache at power-up and upon reset. 


Each CPU chip loads its SROM code into its I-cache and starts executing it. If the 
chip is partially functional, the SROM code continues to execute. However, if the 
chip cannot perform most of its functions, that CPU hangs and that CPU pass/fail 
LED remains off. 


If the system has more than one CPU and at least one passes both the SROM and 
XSROM power-up tests, the system will bring up the console. The console checks 
the FW_SCRATCH register where evidence of the power-up failure is left. Upon 
finding the error, the console sends these messages to COM 1 and the OCP: 


e COMI (or VGA): — Power-up tests have detected a problem with your system 
e OCP: Power-up failure 
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Table 2-2 lists the tests performed by the SROM. 


Table 2-2 SROM Tests 


Test Name 


Logic Tested 


D-cache RAM March 


test 


D-cache Tag RAM 
March test 


S-cache Data March 
test 


S-cache Tag RAM 
March test 


I-cache Parity Error 
test 


D-cache Parity Error 
test 


S-cache Parity Error 
test 


IOD Access test 


"D-cache access, D-cache data, D-cache address logic 


D-cache tag store RAM, D-cache bank address logic 


S-cache RAM cells, S-cache data path, S-cache address 
path 


S-cache tag store RAM, S-cache bank address logic 


I-cache parity error detection, ISCR register and error 
forcing logic, IC_PERR_STAT register and reporting 
logic 

D-cache parity error detection, DC_MODE register and 
parity error forcing logic, DC_PERR_STAT register and 
reporting logic 

S-cache parity error detection, AC_CTL register and 


parity error forcing logic, SC_STAT register and 
reporting logic 


Access to IOD CSRs, data path through CAP chip and 


MDPO on each IOD, PCIO A/D lines <31:0> 
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2.4 SROM Enors Reported 


The SROM reports machine checks, pending interrupt/exception errors, and 
errors related to corruption of FEPROM 0. If SROM errors are fatal, the 
particular CPU will hang and only the CPU self-test pass LEDs and/or the LEDs 
on the system bus to PCI bus bridge module will indicate the failure. 


Example 2-1 SROM Enors Reported at Power-Up 


Unexpected Machine Check (CPU Error) 
UNEX MCHK on CPU 0 

EXC_ADR 42a9 

EI STAT f£f£fffffOO04tffrfret 

EI ADDR f£ffFfFfO00000801£ 
SC_STAT 0 

SC_ADDR FFFFFFOOOQOO0O05F2F 


Pending Interrupt/Exception (CPU Error) 
INT-EXC on CPUO 

ISR 400000 

EI STAT f£f£fffffLOO7FLLLeL 

EI ADDR ffffffr7fffrffffdF 


FIL SYN 631B 
BCTGADR ffffffal7fffcafff 


FEPROM Failures (PCI Motherboard Error) 


Sctr 0 -XSROM headr PTTRN fail 
Sctr 0 -XSROM headr CHKSM fail 
Sctr 0 -XSROM code CHKSM fail 
Sctr 2 -XSROM headr PTTRN fail 
Sctr 2 -XSROM headr CHKSM fail 
Sctr 2 -XSROM code CHKSM fail 
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2.5 XSROM Power Up Test How 


Once the SROM has completed its tests and verified the path to the FEPROM 
containing the XSROM code, it loads the first 8 Kbytes of XSROM into the 


primary CPU’s S-cache and jumps to it. 


Figure 2-6 XSROM Power-Up Howchart 
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output to the OCP. 


XSROM tests are described in Table 2-3. Failure indicates a CPU failure. 
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After jumping to the primary CPU’s S-cache, the code then intentionally I-caches 
itself and is completely register based (no D-stream for stack or data storage is used). 
The only D-stream accesses are writes/reads during testing. 


Each FEPROM has sixteen 64-Kbyte sectors. The first sector contains B-cache tests, 
memory tests, and a fail-safe loader. The second sector contains PALcode. The 
third sector contains a copy of the first sector. The remaining thirteen sectors 
contain the SRM console and decompression code. 


NOTE: Memory tests are run during power-up and reset (see Table 2-4). They are 
also affected by the state of the memory_test environment variable, which can have 
the following values: 


FULL Test all memory 
PARTIAL Test up to the first 256 Mbytes 
NONE Test 32 Mbytes 


Table 2-3 XSROM Tests 


Test TestName Logic Tested 

11 B-cache Tag Data Line test Access to B-cache tags, shorts between tag data 
and its status and parity bits 

12 B-cache Tag March test B-cache tag store RAMs, B-cache STAT store 
RAMs 

13 B-cache Data Line test B-cache data lines to B-cache data RAMs, 


B-cache read/write logic 


14 B-cache Data March test B-cache data RAMs, CPU chip B-cache 
control, CPU chip B-cache address decode, 
INDEX_H<2x:6> (address bus) 


15 B-cache ECC Data Line test CPU chip ECC generation and checking logic, 
ECC lines from CPU chip to B-cache, B-cache 
ECC RAMs 

16 B-cache Data ECC March test Portion of B-cache data RAMs used for ECC 


17 CPU chip ECC Single/Double bit | CPU chip ECC single-bit error detection and 
Error test correction, ECC double-bit error detection, 
ECC error reporting 


18 B-cache Tag Store Parity Error B-cache tag array, CPU parity detection, 
test EI_ADDR and EI_STAT register operation 


19 B-cache STAT Store Parity Error § B-cache STAT array, CPU chip B-cache STAT 
test parity generation/detection 
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Table 2-4 Memory Tests 


Test TestName 


Logic Tested 


Description 


20 


21 


23* 


24 


Memory 
Data test 


Memory 
Address test 


Memory 
Bitmap 
Building 


Memory 
March test 


Data path to and from 
memory 

Data path on memory and 
RAMs 


Address path to and from 
memory 

Address path on memory 
and RAMs 


No new logic 


No new logic 


01 — FF Errors are reported 
as an 8-bit binary field. A set 
bit indicates a module failure. 
Bit <O> indicates pass/fail of 
MEMO_L; <1> indicates 
pass/fail of MEMO_H; <2> 
indicates pass/fail of 

MEM 1_L; <7> indicates 
pass/fail of MEM3_H. 


Same as test 20. 


Maps out bad memory by 
way of the bitmap. It does not 
completely fail memory. 


Maps out bad memory. 


* There is no test 22. 
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2.6 XSROM Enrors Reported 


The XSROM reports B-cache test errors and memory test errors. It also 
reports a warning if memory is illegally configured. 


Example 2-2 XSROM Enors Reported at Power-Up 


B-cache Error (CPU Error) 


TEST ERR on cpu0 #CPU running the test 

FRU cpu0 

err# 2 

tst# 11 

exp: 5555555555555555 #Expected data 

rev: aaaaaaaaaaaaaaaa #Received data 

adr: fffTf8 #B-cache location error 
#occurred 


Memory Error (Memory Module Indicated) 


20.21 4-. 

TEST ERR on cpu0 #CPU running test 

FRU: MEMI1L #Low member of memory pair 1 
err# c 

tst# 21 


22..23..24..Memory testing complete on cpu0 
Memory Configuration Error (Operator Error) 


ERR! mem_pair0O misconfigured 

ERR! mem pairl card size mismatch 
ERR! mem pairl card type mismatch 
ERR! mem pairl EMPTY 


FEPROM Failures (PCI Motherboard Error) 


Sctr 1 -PAL headr PTTRN fail 
Sctr 1 -PAL headr CHKSM fail 
Sctr 1 -PAL code CHKSM fail 
Sctr 3 -CONSLE headr PTTRN fail 
Sctr 3 -CONSLE headr CHKSM fail 
Sctr 3 -CONSLE code CHKSM fail 
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2.7 Console Power- Up Tests 


Once the SRM console is loaded, it does further testing of each IOD. Table 2-5 
describes the IOD power-up tests, and Table 2-6 describes the PCI motherboard 


power-up tests. 


Table 2-5 IOD Tests 


Description 


Test 

Number TestName 

1 IOD CSR Access 
test 

2 Loopback test 

3 ECC test 

4 Parity Error and Fill 


Error tests 


5 Translation Error 
test 

6 Write Pending test 

fi PCI Loopback test 

8 PCI Peer-to-Peer 
Byte Mask test 


Read and write all CSRs in each IOD. 


Dense space writes to the IOD’s PCI dense 
space to check the integrity of ECC lines on 
the IODs. 


Loopback tests similar to test 2 but with a 
varying pattern to create an ECC of Os. 
Single- and double-bit errors are checked. 


Parity errors are forced on the address and 
data lines on system bus and PCI buses. A 
fill error transaction is forced on the system 
bus. 


A loopback test using scatter/gather address 
translation logic on each IOD. 


Runs test 2 with the write-pending bit set 
and clear in the CAP chip control register. 


Loops data through each PCI on each IOD, 
testing the mask field of the system bus. 


Tests that devices on the same PCI and on 


different PCIs can communicate. 
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Table 2-6 PCI Motherboard Tests (B3050 only) 


Test Diagnostic 

Number ‘Test Name Name Description 

1 PCEB pceb_diag Tests the PCI to EISA bridge chip 

2 ESC esc_diag Tests the EISA system controller 

3 8K NVRAM nvram_diag Tests the NVRAM 

4 Real-Time Clock ds1287_diag Tests the real-time clock chip 

5 Keyboard and 18242 diag Tests the keyboard/mouse chip 
Mouse 

6 Flash ROM flash_diag Dumps contents of flash ROM 

7 Serial and combo_diag Tests COM ports | and 2, the 
Parallel Ports and parallel port, and the floppy 
Floppy 

8 


CD-ROM 


ncr810_diag 


Tests the CD-ROM controller 


For both IOD tests and PCI 0 and PCI | tests, trace and failure status is sent to the 
OCP. If any of these tests fail, a warning is sent to the SRM console device after the 
console prompt (or AlphaBIOS pop-up box). The LEDs on the system bus to PCI 
bus bridge module are controlled by the diagnostics. If a LED is off, a failure 


occurred. 
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2.8 Console Device Determination 


After the SROM and XSROM have completed their tasks, the SRM console 
program, as it starts, determines where to send its power-up messages. 


Figure 2-7 Console Device Determination Howchart 
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Console Device Options 


The console device can be either a serial terminal or a graphics monitor. 
Specifically: 


eA serial terminal connected to COM1 off the server control module. The 
terminal connected to COMI must be set to 9600 baud. This baud rate cannot 
be changed. 


e A graphics monitor off an adapter on PCIO. 


Systems running Windows NT must have a graphics monitor as the console device 
and run AlphaBIOS as the console program. 


During power-up, the SROM and the XSROM always send progress and error 
messages to the OCP and to the COM] serial port if the SRM console environment 
variable (set with the set console command) is set to serial. If the console 
environment variable is set to graphics, no messages are sent to COM 1. 


If the console device is connected to COM1, the SROM, XSROM, and console 
power-up messages are sent to it once it has been initialized. If the console device is 
a graphics device, console power-up messages are sent to it, but SROM and XSROM 
power-up messages are lost. No matter what the console environment variable 
setting, each of the three programs sends messages to the control panel display. 


Messages Console Setto 

Sent By Serial Graphics 
SROM COM1 Lost 
XSROM COMI Lost 

SRM console — COMI _VGA 


Changing Where Console Output Is Displayed 


You can change where console output is displayed, assuming the SRM console has 
fully powered up and the os_type environment variable is set to openvms or unix. 
(The following does not work if os_type is set to nt.) 


If the console environment variable is set to serial and no serial terminal is attached 
to COMI, pressing a carriage return on a graphics monitor attached to the system 
makes it the console device and the console prompt is sent to it. If the console 
environment variable is set to graphics and no graphics monitor is attached to the 
adapter, pressing a carriage return on a serial terminal attached to COM1 makes it 
the console device and the console prompt is sent to it. 
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2.9 Console Power-Up Display 


The entire power-up display prints to a serial terminal (if the console 
environment variable is set to serial), and parts of it print to the control panel 
display. The last several lines print to either a serial terminal or a graphics 
monitor. 


Example 2-3 Power-Up Display 


SROM V1.0 on cpu0d 1) 
SROM V1.0 on cpul 
SROM V1.0 on cpu2 
SROM V1.0 on cpu3 
XSROM V1.0 on cpu2 2) 
XSROM V1.0 on cpul 
XSROM V1.0 on cpu3 
XSROM V1.0 on cpu0d 
BCache testing complete on cpu2 3) 


BCache testing complete on cpu0 

BCache testing complete on cpu3 

BCache testing complete on cpul 

mem_pairO - 128 MB 4] 
mem_pairl - 128 MB 

DO ec 20M IO, OU DOO. OT Oe AY On OA, Oa LD 
Memory testing complete on cpu0 

Memory testing complete on cpul 

Memory testing complete on cpu3 

Memory testing complete on cpu2 
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At power-up or reset, the SROM code on each CPU module is loaded into 
that module’s I-cache and tests the module. If all tests pass, the processor’s 
LED lights. If any test fails, the LED remains off and power-up testing 
terminates on that CPU. 


The first determination of the primary processor is made, and the primary 
processor executes a loopback test to each PCI bridge. If this test passes, the 
bridge LED lights. If it fails, the LED remains off and power-up continues. 
The EISA system controller, PCI-to-EISA bridge, COM1 port, and control 
panel port are all initialized thereafter. 


Each CPU prints an SROM banner to the device attached to the COM1 port 
and to the control panel display. (The banner prints to the COM1 port if the 
console environment variable is set to serial. If it is set to graphics, nothing 
prints to the console terminal, only to the control panel display, until ©). 


Each processor's S-cache is initialized, and the XSROM code in the FEPROM 
on the PCI 0 is unloaded into them. (If the unload is not successful, a copy is 
unloaded from a different FEPROM sector. If the second try fails, the CPU 
hangs.) 


Each processor jumps to the XSROM code and sends an XSROM banner to 
the COM1 port and to the control panel display. 


The three S-cache banks on each processor are enabled, and then the 
B-cache is tested. If a failure occurs, a message is sent to the COM1 port and 
to the control panel display. 


Each CPU sends a B-cache completion message to COM1. 


The primary CPU is again determined, and it sizes memory by reading 
memory registers on the I’C bus. 


The information on memory pairs is sent to COM1. If an illegal memory 
configuration is detected, a warning message is sent to COM] and the control 
panel display. 


Memory is initialized and tested, and the test trace is sent to COM] and the 
control panel display. Each CPU participates in the memory testing. The 
numbers for tests 20 and 21 might appear interspersed, as in Example 2-3. 
This is normal behavior. Test 24 can take several minutes if the memory is 
very large. The message “PO TEST 24 MEM**” is displayed on the control 
panel display; the second asterisk rotates to indicate that testing is continuing. 
If a failure occurs, a message is sent to the COM1 port and to the control 
panel display. 


Each CPU sends a test completion message to COM 1. 


Continued on next page 
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Example 2-3 Power-Up Display (Continued) 


starting console on CPU 0 
sizing memory 

0 128 MB SYNC 

1 128 MB SYNC 
starting console on CPU 1 
starting console on CPU 2 
starting console on CPU 3 
probing IOD1 hose 1 8) 

bus 0 slot 1 - NCR 53C810 

bus 0 slot 2 - DECchip 21041-AA 

bus 0 slot 3 - NCR 53C810 

bus 0 slot 4 - DECchip 21040-AA 
probing IODO hose 0 

bus 0 slot 1 - PCEB 
Configuring I/O adapters... 
AlphaServer 4100 Console V1.0, 13-MAR-1996 18:18:269 
POOQ>>> 


~-) 
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The final primary CPU determination is made. The primary CPU unloads 
PALcode and decompression code from the FEPROM on the PCI 0 to its B- 
cache. The primary CPU then jumps to the PALcode to start the SRM 
console. 


The primary CPU prints a message indicating that it is running the console. 
Starting with this message, the power-up display is printed to the default 
console terminal, regardless of the state of the console environment variable. 
(If console is set to graphics, the display from here to the end is saved in a 
memory buffer and printed to the graphics monitor after the PCI buses are 
sized and the graphics device is initialized.) 


The size and type of each memory pair is determined. 


The console is started on each of the secondary CPUs. A status message 
prints for each CPU. 


The PCI bridges (indicated as IODn) are probed and the devices are reported. 
1/O adapters are configured. 


The SRM console banner and prompt are printed. (The SRM prompt is shown 
in this manual as POO>>>. It can, however, be P01>>>, PO2>>>, or PO3>>>. 
The number indicates the primary processor.) If the auto_action environment 
variable is set to boot or restart and the os_type environment variable is set 
to unix or openvms, the DIGITAL UNIX or OpenVMS operating system 
boots. 


If the system is running the Windows NT operating system (the os_type 
environment variable is set to nt), the SRM console loads and starts the 
AlphaBIOS console and does not print the SRM banner or prompt. 
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2.10 Fail-Safe Loader 


The fail-safe loader is a software routine that loads the SRM console image from 
floppy. Once the console is running you will want to run LFU to update 
FEPROM 0 with a new image. 


NOTE: FEPROM 0 contains images of the SROM, XSROM, PAL, decompression, 
and SRM console code. 


If the fail-safe loader loads, the following conditions exist on the machine: 


e The SROM has passed its tests and successfully unloaded the XSROM. If the 
SROM fails to unload both copies of XSROM, it reports the failure to the 
control panel display and COM1 if possible, and the system hangs. 


e The XSROM has completed its B-cache and memory tests but has failed to 
unload the PALcode in FEPROM 0 sector 1 or the SRM console code. 


e The XSROM reports the errors encountered and loads the fail-safe loader. 
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Chapter 3 
Troubleshooting 


This chapter describes troubleshooting during power-up and booting, as well as 
diagnostics for AlphaServer 4000/4100 systems. The following topics are covered: 


e Troubleshooting with LEDs 
e =Troubleshooting Power Problems 


e Running Diagnostics—Test Command 


Troubleshooting 3-1 


3.1 Troubleshooting with LEDs 


During power-up, reset, initialization, or testing, diagnostics are run on CPUs, 
memories, bridge modules, PCI motherboards, and sometimes options. The 
following sections describe possible problems that can be identified by checking 
LEDs. 


Figure 3-1 CPU and Bridge Module LEDs 


Bridge Module LEDs (IOD 0 & 1) CPU LEDs 


@ IODO Self-Test Pass DC_OK 


@ 1OD1 Self-Test Pass SROM Oscillator 


CPU Self-Test Pass 
@ POWERLERN OF Regulator OK (EV56) 
@ temp ox 


Bridge Module LEDs (IOD 2 & 3) @ Normally on 
@ |OD2 Self-Test Pass © Normally off 
@ IOD3 Self-Test Pass 
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CPU LEDs 


If the CPU STP LED on any CPU module is lit, that CPU chip is functioning 
properly. If the operating system is NT and the CPU STP LED is off, that CPU 
may or may not be functioning. 


You can use the Halt button on the OCP to prevent the AlphaBIOS console 
(which turns off the CPU STP LED) from booting, thus assuring the validity of 
the CPU STP LED. If the LED is off, replace the CPU. If the LED is lit, you 
can use the SRM console command alphabios to load and run the AlphaBIOS 
console. 


The top LED on a CPU module is a DC OK LED. It is driven by the PCM 
module. If it is not lit, there are probably power problems. 


The second from the top LED on a CPU lights only when the SROM on the 
CPU is loaded. 


On modules with EV56 CPU processors a fourth LED is present at the bottom of 
the column. The LED is normally on indicating that the power regulator on the 
module is working properly. If the LED is off, replace the module. 


System Bus to PCI Bus Bridge Module LEDs (B3040-AA) 
There are four LEDs on the B3040-AA system bus to PCI bus bridge module: 


The top two LEDs indicate the condition of the bridge module. If either is off, 
the module should be replaced. 


The bottom two LEDs are passed from the PCM. Both should be on during 
normal operation. If either is off while the system is on, the LEDs on the PCM 
module should indicate what failed. If they do not, the PCM could be broken or 
the bridge module is not passing the signals to the LEDs. 


NOTE: If AC power is applied and the system is off and a power supply is in 
operation, the power LED, the top one of the bottom two, flashes, indicating the 
presence of Vaux (auxiliary voltage). 


System Bus to PCI Bus Bridge Module LEDs (B3040-AB) 
There are two LEDs on the B3040-AB system bus to PCI bus bridge module: 


The two LEDs indicate the condition of the bridge module. If either is off, the 
module should be replaced. 
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3.1.1 Cabinet Power and Fan LEDs 


Figure 3-2 Cabinet Power and Fan LEDs 


Fan = 3 
Power LED 
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A cabinet system has three exhaust fans at the top of the cabinet. They are powered 
from a small power supply in the fan tray. This power supply also powers the server 
control module at the bottom of the PCI card cage to allow remote access to the 
system. A failure of the power supply is indicated only by the LEDs. No messages 
are displayed. 


There are two LEDs on the top panel: a fan LED and a power LED. 


e When the fan LED (amber) is flashing, a cabinet fan needs replacing. Look to 
see which fan appears broken (either not functioning at all; or it appears to be 
slower than the others). 


e When the power LED (green) is off, either the power supply in the fan tray is 
broken or there is a power problem. 
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3.2 Troubleshooting Power Problems 


Power problems can occur before the system is up or while the system is 
running. If a system stops running, make a habit of checking the PCM. 


Power Problem List 


The system will halt for the following: 


. ACPU fan failure 
. Asystem fan failure 
. An overtemperature condition 
. Power supplied out of tolerance 
. Circuit breaker(s) tripped 
AC problem 
. Interlock switch activation or failure 
. PCM failure 
. Environmental electrical failure or 
unrecoverable system fault 
with auto_action ev = halt or boot 
10.Operator error - failure to unplug all power 
supplies and letting Vaux drain (10 sec delay) 
before restarting 
11.Cable failure 
12. Module failure - System motherboard, PCI 
motherboard, or system bus to PCI bus bridge 
13. SCM breaking the interlock circuit 


Indications of failure: 


1. Power control module LEDs indicate 
CPU fan, system fan, overtemperature, 
and power supply failures 

2. Circuit breaker(s) tripped 


No obvious indications for failures 7 - 13 
from the power system. 
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If Halt Is Caused by Power, Fan, or Overtemperature 


If a system is stopped because of a power, fan, or overtemperature problem, use the 
PCM LEDs to diagnose the problem. See Section 3.2.1. 
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If Power Problem Occurs at Power-Up 


If the system has a power problem on a cold start, the PCM LEDs are not valid until 
after DCOK_SENSE has been asserted. The cause is one of the following: 


e Broken system fan 
e ~=Broken CPU fan 


e Power supplied to the system is out of tolerance (a power supply could be 
broken and the system could still power up) 


e ~=6PCM failure 
e = Interlock failure 
e Wire problems 


e Temperature problem (unlikely) 


Recommended Order for Troubleshooting Failure at Power-Up 


1. Check to see if any CPU fan or system fan is not spinning. Fans can fail by not 
spinning and/or not putting out the tachometer output necessary as input to the 
PCM comparator that checks the fans. (See steps 4 and 5.) Replace broken fan. 


2. Replace the PCM. 


3. Sequentially remove CPUs and try to power up after you remove a CPU. If the 
system powers up, the last CPU you removed had a fan failure. 


4. Check the output of the power supplies. See Section 4.1 for locations of +5 and 
+3.43 volt output pins. If the output is above or below the threshold, replace the 
faulty power supply. 


5. Check the output of each system fan with a voltmeter. Probe the middle of three 
outputs of the fans with the positive lead of the meter and ground the other 
probe. The meter should read 2.5 volts to 3 volts. If a fan’s output is out of this 
range, replace the fan. 


NOTE: You will have to disable the interlocks to check the voltages in step 5. 
You will have only 10 seconds to measure them. There is a 10-second delay 
before the PCM turns off the power. 


The PCM must sense a change in Vaux (auxiliary voltage) to start the power 
supplies. Pressing the On button has no effect if the machine halted because of 
a failure in the power system. The power supplies must be unplugged and 
plugged back in for the On button to work. 
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3.2.1 Power Contol Module LEDs 


The PCM has 11 LEDs visible through the system card cage. The LED display 
shows the relative placement of the LEDs. 


Figure 3-3 PCMLEDs 
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Table 3-1 PowerContol Module LED States 


LED State Description 
DCOK SENSE On Both +5.0V and +3.43V are present and within limits. 
PSO OK On Power supply 0 is present and has asserted POK_H. 
PS1 OK On Power supply 1 is present and has asserted POK_H. 
7 Off Power supply 1 not present. 
PS2 OK On Power supply 2 is present and has asserted POK_H. 
~ Off Power supply 2 not present. 
TEMP OK On The system temperature is below 55° C. 
CPUFAN OK On All CPU fans are OK. 
7 Off A CPU fan has failed. The specific fan is identified by the 
CS_FANx or C_FAN3 LED that remains lit. 
SYSFAN OK On All system fans are OK. 
Off A system fan has failed. The specific fan is identified by 
the CS_FAN«x that remains lit. 
CS FANO On CPU fan 0 and system fan 0 are being sampled or one of 
~ them has failed as indicated by CPUFAN_OK and 
SYSFAN_OK. 
Off CPU fan 0 and system fan 0 are not being sampled and are 
functioning properly. 
CS FANI On CPU fan 1 and system fan 1 are being sampled or one of 
= them has failed as indicated by CPUFAN_OK and 
SYSFAN_OK. 
Off CPU fan 1 and system fan 1 are not being sampled and are 
functioning properly. 
CS FAN2 On CPU fan 2 and system fan 2 are being sampled or one of 
5 them has failed as indicated by CPUFAN_OK and 
SYSFAN_OK. 
Off CPU fan 2 and system fan 2 are not being sampled and are 
functioning properly. 
C FAN3 On CPU fan 3 is being sampled or has failed as indicated by 
7 CPUFAN_OK and SYSFAN_OK. 
Off Off CPU fan 3 and system fan 3 are not being sampled and 


are functioning properly. 
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3.3 Maintenance Bus (FC Bus) 


The IC bus (referred to as the “I squared C bus”) is a small internal 
maintenance bus used to monitor system conditions scanned by the power 
control module, write the fault display, store error state, and track 
configuration information in the system. Although all system modules (not I/O 
modules) sit on the maintenance bus, only the I’C controller accesses it. 
Everything written or read on the IC bus is done by the controller. The block 
diagram below notes differences between the AlphaServer 4000 and 4100 with 
respect to the I’C bus. 


Figure 3-4 IC Bus Block Diagram 
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Monitor 


The I’C bus monitors the state of system conditions scanned by the PCM. There are 
two registers on the PCM: 


e One records the state of the fans and power supplies and is latched when there is 
a fault. 


¢ The other causes an interrupt on the I°C bus when a CPU or system fan fails, an 
overtemperature condition exists, or power supplied to the system is out of 
tolerance. 


The interrupt received by the I’C bus controller on PCI 0 alerts the system of 
imminent power shutdown. The controller has 30 seconds to read the two registers 
and store the information in the EEPROM on the PCM. The SRM console command 
show power reads these registers. 

Fault Display 

The OCP display is written through the I’C bus. 

Error State 

Error state is written and read for power conditions. The state of the Halt button 
(in/out) is read on the I’C bus. 

Configuration Tracking 


Each CPU, PCI bridge, PCI motherboard, and system motherboard has an EEPROM 
that contains information about the module that can be written and read over the I °C 
bus. All modules contain the following information: 


e Module type 

e Module serial number 
e Hardware revision 

e Firmware revision 


e Memory size (only required for memory modules) 
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3.4 Running Diagnostics — Test Command 


The test command runs diagnostics on the entire system, CPU devices, memory 
devices, and the PCI I/O subsystem. The test command runs only from the SRM 
console. Ctrl/C stops the test. 


Example 3-1 Test Command Syntax 


PO0O>>> help test 
FUNCTION 


SYNOPSIS 
test ([-q] [-t <time>] [option] 
where option is: 
cpun 
memn 


pein 
and n can be one of 0, 1, 2, 3, or *. 
The entire system is tested by default if no option specified. 


NOTE: If you are running the Microsoft Windows NT operating system, switch from 
AlphaBIOS to the SRM console in order to enter the test command. From the 
AlphaBIOS console, press in the Halt button (the LED will light) and reset the 
system, or select DIGITAL UNIX (SRM) or OpenVMS (SRM) from the Advanced 
CMOS Setup screen and reset the system. 


test [-t time] [-q] [option] 


-t time Specifies the run time in seconds. The default for system test is 600 
seconds (10 minutes). 


-q Disables the display of status messages as exerciser processes are 
started and stopped during testing. 


option Either cpun, memn, or pein, where n is 0, 1, 2, 3, or *. If nothing is 
specified, the entire system is tested. 
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3.5 Testing an Entire System 


A test command with no modifiers runs all exercisers for subsystems and devices 
on the system. I/O devices tested are supported boot devices. The test runs for 
10 minutes. 


Example 3-2 Sample Test Command 


POO>>> test 
Console is in diagnostic mode 


System test, runtime 600 seconds 
Type *C to stop testing 


Configuring system.. 

polling ncr0O (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7 
dka500.5.0.1.1 DKa500 RRD45 1645 

polling ncr1 (NCR 53C810) slot 3, bus 0 PCI, hose 1 SCSI Bus ID 7 
dkb200.2.0.3.1 DKb200 RZ29B 0007 
dkb400.4.0.3.1 DKb400 RZ29B 0007 

polling floppy0 (FLOPPY) PCEB - XBUS hose 0 

dva0.0.0.1000.0 DVAO RX23 


polling tulip0 (DECchip 21040-AA) slot 2, bus 0 PCI, hose 1 
ewa0.0.0.2.1: 08-00-2B-E5-B4-1A 


Testing EWAO network device 

Testing VGA (alphanumeric mode only) 

Starting background memory test, affinity to all CPUs.. 
Starting processor/cache thrasher on each CPU.. 
Starting processor/cache thrasher on each CPU.. 
Starting processor/cache thrasher on each CPU.. 
Starting processor/cache thrasher on each CPU.. 
Testing SCSI disks (read-only) 

No CD/ROM present, skipping embedded SCSI test 


Testing other SCSI devices (read-only) .. 


Testing floppy drive (dva0, read-only) 
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Program 


Device 


Pass 


Hard/Soft Bytes Written 


Bytes Read 


00003047 
00003050 
00003059 
00003062 
00003084 
000030d8 
000030d9 
0000310d 
ID 


memtest 
memtest 
memtest 
memtest 
memtest 
exer_kid 
exer_kid 
exer_kid 


Program 


memory 
memory 
memory 
memory 
memory 
dkb200.2.0.3 
dkb400.4.0.3 
dva0.0.0.100 


Device 


134217728 
213883392 
200253568 
200253568 
82827392 

0 

0 

0 

Bytes Written 


134217728 
213883392 
200253568 
200253568 
82827392 
13690880 
13674496 

0 

Bytes Read 


00003047 
00003050 
00003059 
00003062 
00003084 
000030d8 
000030d9 
0000310d 
ID 


memtest 
memtest 
memtest 
memtest 
memtest 
exer_kid 
exer_kid 
exer_kid 


Program 


memory 
memory 
memory 
memory 
memory 
dkb200.2.0.3 
dkb400.4.0.3 
dva0.0.0.100 


Device 


432013312 
664716032 
647940864 
648989312 
274693376 

0 

0 

0 

Bytes Written 


432013312 
664716032 
647940864 
648989312 
274693376 
47572992 
0 


00003047 
00003050 
00003059 
00003062 
00003084 
000030d8 
000030d9 
0000310d 


memtest 
memtest 
memtest 
memtest 
memtest 
exer_kid 
exer_kid 


exer_kid 


memory 
memory 
memory 
memory 
memory 
dkb200.2.0.3 
dkb400.4.0.3 
dva0.0.0.100 


Testing aborted. Shutting down tests. 


Please wait.. 


System test complete 


ae 
POO0>>> 
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0 0 
0) 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
Hard/Soft 
0 0 
0 0 
0) 0 
0 0 
0 0 
0 0 
0) 0 
0 0 
Hard/Soft 
0 0 
0 0 
0 0 
0 0 
0 0 
0) 0 
0 0 
0 0 


727711744 
1104015744 
1088289024 
1090385920 

467607808 

0 
0 
0 


727711744 
1104015744 
1088289024 
1090385920 

467607808 

81488896 
81472512 
607232 


3.5.1 Testing Memory 


The test mem command tests individual memory devices or all memory. The 
test shown in Example 3-3 runs for 2 minutes. 


Example 3-3 Sample Test Memory Command 


POQO>>> test memory 
Console is in diagnostic mode 


System test, runtime 120 seconds 
Type *C to stop testing 


Starting background memory test, affinity to all CPUs.. 
Starting memory thrasher on each CPU.. 
Starting memory thrasher on each CPU.. 
Starting memory thrasher on each CPU.. 


Starting memory thrasher on each CPU.. 


ID Program Device Pass Hard/Soft Bytes Written Bytes Read 
000046d7 memtest memory 0) 0 48234496 48234496 
000046e0 memtest memory 122 0 0 126862208 126862208 
000046e9 memtest memory 11 0 0 115329280 115329280 
000046£2 memtest memory 109 0 0 113232384 113232384 
000046fb memtest memory 4 0 0 41937920 41937920 

ID Program Device Pass Hard/Soft Bytes Written Bytes Read 
00004607 memtest memory 0 0 226492416 226492416 
000046e0 memtest memory 566 0 0 592373120 592373120 
000046e9 memtest memory 555 0 0 580840192 580840192 
000046f2 memtest memory 554 0 0 579791744 579791744 
000046fb memtest memory 21 0) 0 220174080 220174080 

ID Program Device Pass Hard/Soft Bytes Written Bytes Read 
000046d7 memtest memory 0 0 404750336 404750336 
000046e0 memtest memory 101 0 0 1058932480 1058932480 
000046e9 memtest memory 1000 0 0 1047399552 1047399552 
000046£2 memtest memory 999 0 0 1046351104 1046351104 
000046fb memtest memory 38 0 0 398410240 398410240 
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Device 


Pass 


Hard/Soft Bytes Written 


Bytes Read 


memory 
memory 
memory 
memory 
memory 


Device 


583008256 
1525491840 
1515007360 
1512910464 

575597952 

Bytes Written 


583008256 
1525491840 
1515007360 
1512910464 

575597952 

Bytes Read 


memory 
memory 
memory 
memory 
memory 


Device 


761266176 
1992051200 
1982615168 
1979469824 

753834112 

Bytes Written 


761266176 
1992051200 
1982615168 
1979469824 

753834112 

Bytes Read 


ID Program 
000046da7 memtest 
000046e0 memtest 
000046e9 memtest 
000046£2 memtest 
000046fb memtest 

ID Program 
000046da7 memtest 
000046e0 memtest 
000046e9 memtest 
000046£2 memtest 
000046fb memtest 

ID Program 
000046da7 memtest 
000046e0 memtest 
000046e9 memtest 
000046£2 memtest 
000046fb memtest 


Memory test complete 


memory 
memory 
memory 
memory 


memory 


Test time has expired... 


POO0>>> 
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0 0 
0 0 
0 0 
0 0 
0) 0 
Hard/Soft 
0 0 
0 0 
0 0 
0 0 
0 0 
Hard/Soft 
0 0 
0 0 
0 0 
0 0 
0 0 


937426944 
2458610560 
2449174528 
2444980736 

932070272 


937426944 
2458610560 
2449174528 
2444980736 

932070272 


3.5.2 Testing PCI 


The test pci command tests PCI buses and devices. The test runs for 2 minutes. 


Example 3-4 Sample TestCommand for PCI 


POO>>> test pci* 


Console is in diagnostic mode 


System test, 


runtime 120 seconds 


Type *C to stop testing 


Configuring all 


poll 


ing ncr0 (NC 


dka500.5.0.1.1 


poll 


ing ncrl (NC 


dkb200.2.0.3.1 


dkb4 


poll 
ewa0.0.0.2.1: 


poll 


00.4.0.3.1 
ing tulipO ( 


ing floppy0 


dva0.0.0.1000.0 


Testing SCSI disks 


(FLO. 


PCI buses.. 


R 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7 


DKa500 RRD45 1645 

R 53C810) slot 3, bus 0 PCI, hose 1 SCSI Bus ID 7 
DKb200 RZ29B 0007 
DKb400 RZ29B 0007 

DECchip 21040-AA) slot 2, bus 0 PCI, hose 1 


08-00-2B-E5-B4-1A 


PPY) PCEB - XBUS hose 0 


DVAO RX23 


Testing all PCI buses.. 


Testing EWAO network device 


Testing VGA (alphanumeric mode only) 


(read-only) 


Testing floppy (dva0, read-only) 


ID Program Device Pass Hard/Soft Bytes Written Bytes Read 
00002c29 exer_kid dkb200.2.0.3 27 0 0 0 14642176 
00002c2a exer_kid dkb400.4.0.3 27 0 0 0 14642176 
00002c5e exer_kid dva0.0.0.100 0) 0) 0 0 0 
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ID Program Device Pass Hard/Soft Bytes Written Bytes Read 


00002c29 exer_kid dkb200.2.0.3 92 0 0 0 48689152 
00002c2a exer_kid dkb400.4.0.3 92 0 0 0 48689152 
00002c5e exer_kid dva0.0.0.100 0 0 0 0 286720 


Testing aborted. Shutting down tests. 


Please wait.. 
Testing complete 


ae 
POO0>>> 
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Chapter 4 
Power System 


This chapter describes the AlphaServer 4000/4100 power system: 


Power Supply 

Power Control Module Features 

Power Circuit and Cover Interlocks 

Power-Up/Down Sequencing 

Cabinet Power Configuration Rules 

Pedestal Power Configuration Rules (North America and Japan) 


Pedestal Power Configuration Rules (Europe and Asia Pacific) 
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4.1 Power Supply 


Power supply ouputs are shown in Figure 4-1. 


Figure 4-1 Power Supply Outputs 
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Power Supply Features 
e 90-264 Vrms input 


e 450 watts output. Output voltages are as follows: 


Output Voltage Min. Voltage Max. Voltage Max. Current 


+5.0 4.85 5.25 50 
+3.43 3.400 3.465 75 
+12 11.5 12.6 11 
-12 -10.9 -13.2 0.2 
-5.0 4.6 5.5 0.2 


Vaux 8.5 9.5 0.05 


e Remote sense on +5.0V and +3.43V 


+5.0V is sensed on all CPUs in the system, the system bus motherboard, and the 
PCI bus motherboard(s). 


+3.43V is sensed on all CPUs in the system and the system bus motherboard. 
e Current share on +5.0V, +3.43V, and +12V. 


e =1 % regulation on +3.43V. 


e Fault protection (latched). If a fault is detected by the power supply, it will shut 
down. The faults detected are: 


Overvoltage 
Overcurrent 
Power overload 


e DC_ENABLE _L input signal starts the DC outputs. 
e POK_H output signal indicates that the power supply is operating properly. 
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4.2 Power Control Module Features 


The power control module (54-24117-01) is located behind the B3040-AA 
module, the system bus to PCI bus bridge module. 


Figure 4-2 PowerContol Module 
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The power control module performs the following functions: 


Controls the power-up/down sequencing. 


Monitors the combined output of power supplies VDD (3.43V) and VCC (5.0V) 
and asserts DCOK_SENSE if these voltages are within range and asserts 
POWER_FAULT_L causing an immediate power shutdown if either is not. 


Monitors system temperature and asserts TEMP_FAIL, if temperature exceeds 
55°C. 


Monitors CPU and system drawer fans and asserts CPUFAN_OK if all CPU fans 
are functioning properly, asserts SYSTEM_FAN_OK if the drawer cooling fans 
are functioning properly; otherwise it asserts FAN_FAULT_L. Each fan is 
checked at | second intervals. 


Powers down the system 30 seconds after detecting TEMP_FAIL, or the absence 
of CPUFAN_OK, or the absence of SYSTEM_FAN_OK by asserting 
POWER_FAULT_L. 


Provides visual indication of faults through LEDs. 


Has two registers, one that generates interrupts when bits change, and one that 
latches errors but does not generate interrupts. 
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4.3 Power Circuitand Cover Intenocks 


Figure 4-3 Power Circuit Diagram 
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Figure 4-3 shows the distribution of power thoughout the system drawer. 

Opens in the circuit or the PCM signal POWER_FAULT_L or the SCM signal 
RSM_DC_EN_L interrupt DC power applied to the system. The opens can be 
caused by the On/Off button or the cover interlocks. The POWER_FAULT_L signal 
is asserted by the PCM module if it detects a fault and the RSM_DC_EN_L is 
controlled remotely. 


A failure anywhere in the circuit will result in the removal of DC power. A potential 
failure is the relay used on the SCM modules to control the RSM_DC_EN_L signal. 


The 4100 and early 4000 system drawers have three cover interlocks: one for the 
system bus card cage, one for the PCI card cage, and one for the power and system 
fan area. Later 4000 system drawers have four cover interlocks; the fourth switch is 
for the second PCI card cage. 


To override the cover interlocks, find a suitable object to close the interlock circuit 
at the location identified in Figure 4-3. The switch assembly that contains single 
switches for all three covers is located where all three covers meet. 
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4.4 Power-Up/Down Sequence 


The On/Off button can be controlled manually or remotely. The button is on 
the OCP. Remote power control is provided though the remote I/O port 
connected to the PCI. The power-up/down sequence flow is shown below. 


Figure 4-4 Power Up/Down Sequence Howchart 


Apply AC 
Power 
Vv 
Vaux on 
it 
On-Off Off 
Button 
= On 
Vv 
Assert 
DC_ENABLE_L 
Vv 
Power Supply 
Starts 
Vv 
10 Second 
Delay 
12 Second 
Yes 
Dela Deassert 
= DC ENABLE L| >| Halt 
N 
Deassert y No 
DC_ENABLE_L Assert 
DCOK_SENSE DCOK_SENSE 
Voltages 
OK On 
Yes On-Off Off 
Button 
30 Second No Fan/Temp 
Delay OK 
Yes 


PKW-0402-95 


4-8 AlphaServer 4000/4100 Service Manual 


When AC is applied to the system, Vaux (auxiliary voltage) is asserted and is sensed 
by the PCM. The PCM asserts DC_LENABLE_L starting the power supplies. If 
there is a hard fault on power-up, the power supplies shut down immediately; 
otherwise, the power system powers up and remains up until the system is shut off or 
the PCM senses a fault. If a power fault is sensed, the power system attempts to 
restore power and will do so if the fault is not sensed a second time. If the fault is 
still present, the power system shuts down. 


Since Vaux is independent of the power supply start, the AC plugs at the front of the 
supplies must be removed to reset Vaux, allowing capacitors to drain voltage. All 
power failures require this procedure since the PCM must sense a change in Vaux to 
start the power supplies. 
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4.5 Cabinet Power Configuration Rules 


There are four cabinets with different power delivery systems. See page 1-9 for 
a description of differences. A barcode label designating the cabinet variation is 
located inside the back door in the upper left corner of the bezel holding the 
door. The four variations are: H9A10-EB, -EC, -EL, -EM. 


Figure 4-5 Simple -EB & -EC Cabinet Power Configuration 
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Figure 4-6 Worst-Case -EB & -EC Cabinet Power Configuration 
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Single Drawer 1100 VA 
Single StorageWorks Shelf 150 VA 
System Fan Tray 100 VA 
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Figure 4-7 -EL&-EM Single Drawer Cabinet Power Configuration 
(Single drawer -EM shown with H7600- DB c ontoller) 
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Figure 4-8 -EL Three Drawer Cabinet Power Configuration 
(Three drawer -EL shown with H7600-AA controller) 
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4.6 Pedestal Power Configuration Rules (North 
America and J apan) 


Figure 4-9 Pedestal Power Distibution (N.A. and J apan) 


StorageWorks StorageWorks 


Power Strips ty 
0.75 Arms 


“ 0.75 Arms 
0.75 Arms 
: 0.75 Arms System Drawer 
P = = 
Fb b 3.67 Arms 7 i 
ws) | fe —Berams 
gE 5 rms 
B 
” b 
BE 
15A) 
7 
[og 100 - 120 Vrms 
11.0 Arms 
| 100 - 120 Vrms 
3:0: Arms PKW0406B-95 
Total Power Available -N. America: 1800 VA per branch circuit and 1400 VA 


(Assuming a 15 A branch) per line cord 
Japan: 1500 VA per branch circuit and 1200 VA per 


line cord 
Single Drawer 1100 VA 
Single StorageWorks Shelf 150 VA 
Outlets 12 NEMA receptacles 
Power Strip Single AC power strip supports one system drawer 


and one StorageWorks shelf. 

When two AC power strips are used, combined AC 

input line current cannot exceed the site circuit 

breaker restriction, assuming both strips are plugged 
_in to the same circuit. 
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4.7 Pedestal Power Configuration Rules (Europe 
and Asia Pac ific ) 


Figure 4-10 Pedestal Power Distibution (Europe and AP) 


Pi Stri 
ae StorageWorks StorageWorks 
| ——] 
0.34 Arms 


4 0.34 Arms 
b 0.34 Arms 
p—o.34 Arms System Drawer 
| | 
104) Hs 7" 
g f= 1.67 Arms | 
1.67 Arms 
1.67 Arms 


= 
1 200 - 240 Vrms 
5.0 Arms 
cg 200 - 240 Vrms 
3.0 Arms 
PKW0406C-95 
Total Power Available ~ 2200 VA per power strip 
Single Drawer 1100 VA 
Single StorageWorks Shelf 150 VA 
Outlets 10 IEC 320 receptacles max. One receptacle is 
blocked on each power strip to control leakage. 
Power Strip Single AC power strip supports one system drawer 


_and three StorageWorks shelves. 
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Chapter 5 
Error Logs 


This chapter provides information on troubleshooting with error logs. The following 
topics are covered: 


Using Error Logs 

Using DECevent 

Error Log Examples and Analysis 

Troubleshooting IOD-Detected Errors 

Double Error Halts and Machine Checks While in PAL Mode 


Error registers are described in Chapter 6. 
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5.1 Using Enor Logs 


Error detection is performed by CPUs, the IOD, and the EISA to PCI bus 
bridge. (The IOD is the acronym used by software to refer to the system bus to 
PCI bus bridge.) 


Figure 5-1 Enor Detector Placement 
CPU Module 


E00) x]|. System Bus | Sys/PCl 
Data Bus Bridge 
rp System Bus 
Comd/add 


B-cache 


Tag & Status (Ps) 


Duplicate Tag EISA PCl 


Tag & Status (Ps) 


4 re J Parity logic Parity stored — 
Eco) ECC logic Sy) ECC stored © 
w iA PKW0450-96 
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Lines Protected Device 


ECC Protected 
System bus data lines IOD on every transaction, 
CPU when using the bus 
B-cache IOD on every transaction, 
CPU when using the bus 
Parity Protected 
System bus command/address lines IOD on every transaction, 
CPU when using the bus 
Duplicate tag store IOD on every transaction, 
CPU when using the bus 
B-cache index lines CPU 
PCI bus IOD 
EISA bus EISA bridge 


As shown in Figure 5-1 and the accompanying table, the CPU chip is isolated by 
transceivers (X VER) from the data and command/address lines on the module. This 
allows the CPU chip access to the duplicate tag and B-cache while the system bus is 
in use. The CPU detects errors only when it is the consumer of the data. The IOD 
detects errors on each system bus cycle regardless of whether it is involved in the 
transaction. 


System bus errors detected by the CPU may also be detected by the IOD. It is 
necessary to check the IOD for errors any time there is a CPU machine check. 


e If the CPU sees bad data and the IOD does not, the CPU is at fault. 


e If both the CPU and the IOD see bad data on the system bus, either memory or a 
secondary CPU is the cause. In such a case, the Dirty bit, bit<20>, in the IOD 
MC_ERRI1 Register should be set or clear. If the Dirty bit is set, the source of 
the data is a CPU’s cache destined for a different CPU. If the Dirty bit is not 
set, memory caused the bad data on the bus. In this case, multiple error log 
entries occur and must be analyzed together to determine the cause of the error. 
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5.1.1 Hard Enors 


There are two categories of hard errors: 


e System-independent errors detected by the CPU. These errors are processor 
machine checks handled as MCHK 670 interrupts and are: 


Internal EV5 or EV56 cache errors 
CPU B-cache module errors 


e System-dependent errors detected by both the CPU and IOD. These errors are 
system machine checks handled as MCHK 660 interrupts and are: 


CPU-detected external reference errors 
IOD hard error interrupts 


The IOD can detect hard errors on either side of the bridge. 


5.1.2 Soft Enors 


There are two categories of soft errors: 


e System-independent errors detected and corrected by the CPU. These errors are 
CPU module correctable errors handled as MCHK 630 interrupts. 


e System-dependent errors that are correctable single-bit errors on the system bus 
and are handled as MCHK 620 interrupts. 
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5.1.3 EnorLog Events 


Several different events are logged by OpenVMS and DIGITAL UNIX. Windows 
NT does not log errors in this fashion. 


Table 5-1 Types of EnorLog Events 


Enor Log Event Description 


MCHK 670 Processor machine checks. These are synchronous 
errors that inform precisely what happened at the time 
the error occurred. They are detected inside the CPU 
chip and are fatal errors. 


MCHK 660 System machine checks. These are asynchronous 
errors that are recorded after the error has occurred. 
Data on exactly what was going on in the machine at 
the time of the error may not be known. They are 
fatal errors. 


MCHK 630 Processor correctable errors 

MCHK 620 System correctable errors 

Last fail Used to collect system bus registers prior to crashing 

1/O error interrupt IOD error interrupts 

System environment Used to provide status on power, fans, and 
temperature 

Configuration Used to provide system configuration information 
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5.2 Using DECevent 


DECevent produces bit-to-text ASCII reports derived from system event entries 
or user-supplied event logs. The format of the reports is determined by 
commands, qualifiers, parameters, and keywords appended to the comand. The 
maximum command line length is 255 characters. 


DECevent allows you to do the following: 


e Translate event log files into readable reports 
e Select alternate input and output files 

e = Filter input events 

e Select alternative reports 

e = Translate events as they occur 


e Maintain and customize your environment with the interactive shell commands 


To access on-line help: 


OpenVMS 


S$ HELP DIAGNOSE or 
S$ DIA /INTERACTIVE 
DIA> HELP 


DIGITAL UNIX 


> man dia or 
> dia hlp 


Privileges necessary to use DECevent: 


e SYSPRV for the utility 
e DIAGNOSE to use the /CONTINUOUS qualifier 
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5.2.1 Translating Event Files 


To produce a translated event report using the default event log file, 
SYS$ERRORLOG:ERRLOG.SYS, enter the following command: 


OpenVMS 
$ DIAGNOSE 


DIGITAL UNIX 
> dia -a 


The DIAGNOSE command allows DECevent to use built-in defaults. This command 
produces a full report, directed to the terminal screen, from the input event file, 
SYS$ERRORLOG:ERRLOG.SYS. The /TRANSLATE qualifier is understood on 
the command line. 


To select an alternate input file 
OpenVMS 
$ DIAGNOSE ERRORLOG.OLD 


DIGITAL UNIX 
> dia -a -f syserr-old.hostname 


These commands select an alternate input file (ERRORLOG.OLD or syserr-old) as 
the event log to translate. The file name can contain the directory or path, if needed. 
Wildcard characters can be used. 


To send reports to an output file 


OpenVMS 
S$ DIAGNOSE/OUTPUT=ERRLOG_OLD. TXT 


DIGITAL UNIX 
> dia -a > syserr-old.txt 


These commands direct the output of DECevent to ERRLOG_OLD.TXT or 
syserr -old.txt. 


ErorLogs 5-7 


To reverse the order of the input events 


OpenVMS 
S DIAGNOSE/TRANSLATE/REVERSE 


DIGITAL UNIX 
> dia -R 


These commands reverse the order in which events are displayed. The default order 
is forward chronologically. 


5.2.2 Filtering Events 


/INCLUDE and /EXCLUDE qualifiers allow you to filter input event log files. 


The /INCLUDE qualifier is used to create output for devices named in the 
command. 


OpenVMS 
S DIAGNOSE/TRANSLATE/ INCLUDE= (DISK=RZ, DISK=RA92,CPU) 


DIGITAL UNIX 
> dia -i disk=rz disk=ra92 cpu 


The commands shown here create output using only the entries for RZ disks, RA92 
disks, and CPUs. 


The /EXCLUDE qualifier is used to create output for all devices except those named 
in the command. 


OpenVMS 
S$ DIAGNOSE/TRANSLATE/EXCLUDE= (MEMORY) 


DIGITAL UNIX 


> dia -x mem 
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Use the /BEFORE and /SINCE qualifiers to select events before or after a certain 
date and time. 


OpenVMS 

S DIAGNOSE/TRANSLATE/BEFORE=15-JAN-1996:10:30:00 
or 

S DIAGNOSE/TRANSLATE/SINCE=15-JAN-1996:10:30:00 


DIGITAL UNIX 
> dia -t s:15-jan-1996 e:20-jan-1996 


If no time is specified, the default time is 00:00:00, and all events for that day are 
selected. 


The /BEFORE and /SINCE qualifiers can be combined to select a certain period of 
time. 


OpenVMS 
S DIAGNOSE/TRANSLATE/SINCE=15-JAN-1996/BEFORE=2 0-JAN-1996 


If no value is supplied with the /SINCE or /BEFORE qualifiers, DECevent defaults 
to TODAY. 
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5.2.3 Selecting Altemative Reports 


Table 5-2 describes the DECevent report formats. Report formats are mutually 
exclusive. No combinations are allowed. The default format is /Full. 


Table 5-2 DECevent Report Formats 


Format Description 

/Full | Translates all available information for each event 

/Brief Translates key information for each event 

/Terse Provides binary event information and displays register values 


and other ASCII messages in a condensed format 
/Summary Produces a statistical summary of the events in the log 


/Fsterr Produces a one-line-per-entry report for disk and tape devices 


The syntax is: 


OpenVMS 
S DIAGNOSE/TRANSLATE/<format> 


DIGITAL UNIX 
> dia -o <format> 


5-10 AlphaServer 4000/4100 Service Manual 


5.3 ErrorLog Examples and Analysis 


The following sections provide examples and analysis of error logs. 


5.3.1 MCHK 670 CPU- Detected Failure 


The error log in Example 5-1 shows the following: 


1) CPU1 logged the error in a system with two CPUs. 


2) During a D-ref fill, the External Interface Status Register logged an 
uncorrectable EEC error. (When a CPU chip does not find data it needs to 
perform a task in any of its caches, it requests data from off the chip to fill 
its D-caches. It performs a “D-ref fill.) Bit<30> is clear, indicating that 
the source of the error is the B-cache. 


3) Neither IOD CAP Error Register saw an error. 


The error was detected by a CPU and the data was not on the system bus. 
Otherwise, the IODs would have seen the error. Therefore, CPU1 is broken. 


NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 
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Example 5-1 MCHK670 


Logging OS 

System Architecture 
Event sequence number 
Timestamp of occurrence 
Host name 


System type register 


Number of CPUs (mpnum) 
CPU logging event (mperr) 


Event validity 
Event severity 
Entry type 

CPU Minor class 
Software Flags 


Active CPUs 
Hardware Rev 


System Serial Number 
Module Serial Number 
Module Type 

System Revision 

* MCHK 670 Regs * 
Flags: 

PCI Mask 

Machine Check Reason 
PAL SHADOW REG 0 

PAL SHADOW REG 1 

PAL SHADOW REG 6 

PAL SHADOW REG 7 
PALTEMPO 

PALTEMP1 

PALTEMP2 

PALTEMP22 

PALTEMP23 

Exception Address Reg 
Exception Summary Reg 
Exception Mask Reg 
PAL BASE 

x0000000008 


Interrupt Summary Reg 


IBOX Ctrl and Status Reg 


2. DIGITAL UNIX 
2. Alpha 
4. 
04-APR-1996 17:20:04 
whip16 
x00000016 AlphaStation 4x00 
x00000002 1) 
x00000001 


1. O/S claims event is valid 
1. Severe Priority 
100. CPU Machine Check Errors 
1. Machine check (670 entry) 
x0000000300000000 


IOD 1 Register Subpkt Pres 
IOD 2 Register Subpkt Pres 


x00000003 
x00000000 
C1563 
x0000 
x00000000 
x00000000 
x0000 
x0098 
x00000000 
x00000000 
x00000000 
x00000000 
x00000000E87C7A58 
XFFFFFFFE8F 658000 
xFFFFFCO0003C9F40 
xFFFFFCO0004F9D60 
x00000000E8709A58 
xFFFFFCOO0003BFB88 
Native-mode instruction 
Exception PC x3FFFFFOOO00EFEE2 
x00000000 
x00000000 
x00000000020000 
Base addr for palcode = 
x00000000 


AST requests 3 - 0 x00000000 
x000000C160000000 
Timeout Bit Not Set 
PAL Shadow Registers Enabled 
Correctable Err Intrpts Enabled 


ICACHE BIST Successful 
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TEST_STATUS_H Pin Asserted 
Icache Par Err Stat Reg x00000000 
Dcache Par Err Stat Reg x00000000 
Virtual Address Reg XFFFFFFFE8F63BD38 
Memory Mgmt Flt Sts Reg x000000000166D1 
Ref which caused err was a write 
Ref resulted in DTB miss 
RA Field x0000000000001B 
Opcode Field x0000000000002C 


Scache Address Reg xFFFFFF00000254BF 
Scache Status Reg x00000000 
Bcache Tag Address Reg XFFFFFF80E98F7FFF 


External cache hit 

Parity for ds and v bits 
Cache block dirty 

Cache block valid 

Ext cache tag addr parity bit 
Tag address<38:20> is 


x00000000000E98 

Ext Interface Address Reg xFFFFFFO0E984DBCF 

Fill Syndrome Reg x0000000000002B 

Ext Interface Status Reg xFFFFFFF104FFFFFF (2) 
Uncorrectable ECC error 
Error occurred during D-ref fill 

LD LOCK xFFFFFF003797340F 

** IOD SUBPACKET -> ** IOD 0 Register Subpacket 

WHOAMI x000000BB Device ID x0000003B 


Bcache Size = 2MB 
VCTY ASIC Rev = 0 
Module Revision 0. 
Base Address of Bridge x000000F9E0000000 
PCI Revision x06008021 CAP Chip Revision x00000001 
Horse Module Revision x00000002 
Saddle Module Revision x00000000 
Saddle Module Type Left Hand 
EISA Present 
PCI Class Code x00000600 
MC-PCI Command Register x06480FF1 Selftest passed 
Delayed read enabled 
Bridge PCI trans enabled 
Req 64 bit data trans enabled 
Accept 64 bit data trans enabled 
heck PCI Addr Parity enabled 
heck MC bus CMS/Addr Parity 
nabled 
heck MC bus NXM enabled 
heck all transaction enabled 
6 byte aligned block write enabled 
rite Pend Number Thresho x00000008 
D_TYPE Short 
L TYPE Medium 
RM_TYPE Long 


ARB_MODE MC-PCI Bridge Priority 


DHASErFAAQMDAA 


Mode 

Memory Host Addr Exten x00000000 

IO Host Addr Extension x00000000 

Interrupt Control x00000003 MC-PCI Intr Enabled 
Device intr info enabled if en_int= 
1 

Interrupt Request x00000000 Interrupts asserted x00000000 


Interrupt Mask Register 0 x00C50010 
Interrupt Mask Register 1 x00000000 
MC Error Info Register 0 xE0000000 MC bus trans addr <31:4> x0E000000 
MC Error Info Register 1 x000E88FD MC bus trans addr <39:32>x000000FD 
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CAP Error Register 

PCI Bus Trans Error Adr 
MDPA Status Register 
MDPA Error Syndrome Reg 


MDPB Status Register 
MDPB Error Syndrome Reg 


** IOD SUBPACKET -> ** 


WHOAMI 


Base Address of Bridge 
PCI Revision 


MC-PCI Command Register 


Memory Host Addr Exten 
IO Host Addr Extension 
Interrupt Control 


Interrupt Request 


Interrupt Mask Register 0 
Interrupt Mask Register 1 


MC Error Info Register 0 
MC Error Info Register 1 


CAP Error Register 

PCI Bus Trans Error Adr 
MDPA Status Register 
MDPA Error Syndrome Reg 


MDPB Status Register 
MDPB Error Syndrome Reg 


MC_Command x00000008 
Device Id x0000003A 


x00000000 (no error seen) (3] 
x00000000 
x00000000 MDPA Chip Revision x00000000 
x00000000 Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 
x00000000 MDPB Chip Revision x00000000 
x00000000 Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 
IOD 1 Register Subpacket 
x000000BB Device ID x0000003B 
Bcache Size = 2MB 
VCTY ASIC Rev = 0 
Module Revision 0. 
x000000FBE0000000 
x06000021 CAP Chip Revision x00000001 
Horse ModuleRevision x00000002 
Saddle Module Revision x00000000 
Saddle Module Type Left Hand 
PCI Class Code x00000600 
x06480FF1 Selftest passed 
Delayed read enabled 
Bridge PCI trans enabled 
Req 64 bit data trans enabled 
Accept 64 bit data trans enabled 
Check PCI Addr Parity enabled 
Check MC bus CMS/Addr Parity 
enabled 
Check MC bus NXM enabled 
Check all transaction enabled 
16 byte aligned block write enabled 
Write Pend Number Thresho x00000008 
RD_TYPE Short 
RL_TYPE Medium 
RM_TYPE Long 
ARB_MODE MC-PCI Bridge Priority 
Mode 
x00000000 
x00000000 
x00000003 MC-PCI Intr Enabled 
Device intr info enabled if en_int 
= 1 
x00000000 Interrupts asserted x00000000 
x00C50001 
x00000000 
xE0000000 MC bus trans addr <31:4> x0E000000 
xOOOE88FD MC bus trans addr <39:32> x000000FD 
MC_Command x00000008 
Device Id x0000003A 
x00000000 (no error seen) (3) 
xC0018B48 
x00000000 MDPA Chip Revision x00000000 
x00000000 Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 
x00000000 MDPB Chip Revision x00000000 
x00000000 Cycle 0 ECC Syndrome x00000000 


Cycle 1 ECC Syndrome x00000000 
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Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 


PALcode Revision Palcode Rev: 1.21-3 
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5.3.2 MCHK 670 CPU and IOD-Detected Failure 


The error log in Example 5-2 shows the following: 


1) CPU3 logged the error in a system with four CPUs. 


2) The External Interface Status Register logged an uncorrectable ECC error 
during a D-ref fill. (When a CPU chip does not find data it needs to perform a 
task in any of its caches, it requests data from off the chip to fill its D-cache. 
It performs a “D-ref fill.”) Bit <30> is set, indicating that the source of the 
error is memory or the system. Bits <32> and <35> are set, indicating an 
uncorrectable ECC error and a second external interface hard error, 
respectively. 


Both IOD CAP Error Registers logged an error. 
The command at the time of the error was a read. 


The bus master at the time of the error was CPU3. 


ooo © 


The Dirty bit, bit <20> in the MC_ERRI Register is clear, indicating the data 
is clean and comes from memory. 


The error was detected by a CPU, and the data was on the system bus and is clean. 
Therefore, a memory module provided the wrong data. (If the Dirty bit had been set, 
the data would have come from the cache of another CPU.) To determine which 
memory, see Section 5.4 


NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 
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Example 5-2. MCHK 670 CPU and IOD-Detected Failure 


Logging OS 

System Architecture 
Event sequence number 
Timestamp of occurrence 
Host name 


System type register 


Number of CPUs (mpnum) 
CPU logging event (mperr) 


Event validity 
Event severity 
Entry type 


CPU Minor class 


Software Flags 


Active CPUs 

Hardware Rev 

System Serial Number 
Module Serial Number 
Module Type 

System Revision 

670 Regs * 
Mask 
Check Reason 
PAL SHADOW REG 0 

PAL SHADOW REG 1 


PAL SHADOW 
PAL SHADOW 
PALTEMPO 
PALTEMP1 


REG 6 
REG 7 


PALTEMP23 


Exception Address Reg 


Exception 
Exception 
PAL BASE 


Summary Reg 
Mask Reg 


x0000000008 
Interrupt Summary Reg 


IBOX Ctrl and Status Reg 


Icache Par Err Stat Reg 


2. DIGITAL UNIX 
2. Alpha 
6 
O8-APR-1996 11:27:55 
whipl6é 
x00000016 AlphaStation 4x00 
x00000004 1) 
x00000003 
1. O/S claims event is valid 
1. Severe Priority 
100. CPU Machine Check Errors 
1. Machine check (670 entry) 
x0000000300000000 
IOD 1 Register Subpkt Pres 
IOD 2 Register Subpkt Pres 
x0000000F 
x00000000 
C1563 
x0000 
x00000000 
x00000000 
x0000 
x0098 
x00000000 
x00000000 
x00000000 
x00000000 
x00000001401A7A90 
x00000000000021 
x00000000ECE77A58 
x000000012005A8B4 
Native-mode instruction 
Exception PC x0000000048016A2D 
x00000000 
x00000000 
x00000000020000 
Base addr for palcode = 
x00000000 
AST requests 3 - 0 x00000000 


x000000C164000000 
Timeout Bit Not Set 


Floating Point Instr. may be issued 


PAL Shadow Registers Enabled 

Correctable Err Intrpts Enabled 

ICACHE BIST Successful 

TEST_STATUS_H Pin Asserted 
x00000000 
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Dcache Par Err Stat Reg x00000000 
Virtual Address Reg 


Memory Mgmt Flt Sts Reg 


Scache Address Reg 
Scache Status Reg 
Bcache Tag Address Reg 


x00000000 


x00000000000286 
Ext Interface Address Reg 
Fill Syndrome Reg 


Ext Interface Status Reg 


LD LOCK 
** IOD SUBPACKET -> ** 


WHOAMI x000000BF 


Base Address of Bridge 


PCI Revision x06008021 


MC-PCI Command Register x06460FF1 


x00000000 
x00000000 
x00000003 


Memory Host Addr Exten 
IO Host Addr Extension 
Interrupt Control 


Interrupt Request x00810000 


Interrupt Mask Register 0 x00C50010 
Interrupt Mask Register 1 x00000000 


x00000001407D6000 
x00000000011A10 


Ref resulted in DTB miss 
RA Field x0000000008 


Opcode Field x00000000000023 


xFFFFFFO00000254BF 


XFFFFFF80286F7FFF 


External cache hit 

Parity for ds and v bits 
Cache block dirty 

Cache block valid 

Ext cache tag addr parity bit 
Tag address<38:20> is 


xXFFFFFFO0028681A8F 
x00000000004B00 


xFFFFFFF 984FFFFFF (2) 


Uncorrectable ECC error 

Error occurred during D-ref fill 
Second external interface hard 
error 


xFFFFFF000020040F 


IOD 0 Register Subpacket 
Device ID x0000003F 
Bcache Size = 2MB 

VCTY ASIC Rev = 0 

Module Revision 0. 


x000000F9E0000000 


CAP Chip Revision x00000001 
Horse Module Revision x00000002 
Saddle Module Revision x00000000 
Saddle Module Type Left Hand 
EISA Present 
PCI Class Code 
Selftest passed 
Delayed read enabled 

Bridge PCI trans enabled 

Req 64 bit data trans enabled 
Accept 64 bit data trans enabled 
heck PCI Addr Parity enabled 

heck MC bus CMS/Addr Parity 

nabled 

heck MC bus NXM enabled 

heck all transaction enabled 

6 byte aligned block write enabled 
rite Pend Number Thresho x00000006 
D_TYPE Short 

L TYPE Medium 

RM_TYPE Long 

ARB_MODE MC-PCI Bridge Priority 
Mode 


x00000600 


DHASErFAAMDAA 


MC-PCI Intr Enabled 

Device intr info enabled if en_int 
= 1 

Interrupts asserted 
Hard Error 


x00010000 
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*28681A80 
x800FD800 


MC Error Info Register 0 
MC Error Info Register 1 


MC bus trans addr <31:4> 
MC bus trans addr <39:32> x00000000 


x028681A8 6] 


MC_Command x00000018 (4) 


Device Id x0000003F (5) 
MC error info valid 


CAP Error Register xC0000000 


PCI Bus Trans Error Adr 
MDPA Status Register 
MDPA Error Syndrome Reg 


x000003FD 
x00000000 
x00000000 


MDPB Status Register x80000000 


MDPB Error Syndrome Reg x0000004B 


x0000000000004B 


** IOD SUBPACKET -> ** 


WHOAMI x000000BF 


Base Address of Bridge 


PCI Revision x06000021 


MC-PCI Command Register x06460FF1 


x00000000 
x00000000 
x00000003 


Memory Host Addr Exten 
IO Host Addr Extension 
Interrupt Control 


Interrupt Request x00800000 


Interrupt Mask Register 0 x00C50001 
Interrupt Mask Register 1 x00000000 


MC Error Info Register 0 
MC Error Info Register 1 


*x28681A80 MC bus trans addr <31:4> 
x800FD800 MC bus trans addr <39:32> x00000000 


Uncorrectable ECC err det by MDPB 
MC error info latched (3) 


MDPA Chip Revision x00000000 
Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 
MDPB Chip Revision x00000000 
MPDB Error Syndrome of 
uncorrectable read error 

Cycle 0 ECC Syndrome 
Cycle 1 ECC 
Cycle 2 ECC 
Cycle 3 ECC 


Syndrome x00000000 
Syndrome x00000000 
Syndrome x00000000 


IOD 1 Register Subpacket 
Device ID x0000003F 
Bcache Size = 2MB 

VCTY ASIC Rev = 0 

Module Revision 0. 


x000000FBE0000000 


CAP Chip Revision x00000001 

Horse Module Revision x00000002 
Saddle Module Revision x00000000 
Saddle Module Type Left Hand 

PCI Class Code x00000600 

Selftest passed 

Delayed read enabled 

Bridge PCI trans enabled 

Req 64 bit data trans enabled 
Accept 64 bit data trans enabled 
heck PCI Addr Parity enabled 

heck MC bus CMS/Addr Parity 

nabled 

heck MC bus NXM enabled 

heck all transaction enabled 

6 byte aligned block write enabled 
rite Pend Number Thresho x00000006 
D_TYPE Short 

L TYPE Medium 

RM_TYPE Long 

ARB_MODE MC-PCI Bridge Priority 
Mode 


DDSErPQAQMDAA 


MC-PCI Intr Enabled 

Device intr info enabled if en_int 
= 1 

Interrupts asserted 
Hard Error 


x00000000 


x028681A8 6] 
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MC_Command x00000018 e 


Device Id x0000003F (5) 
MC error info valid 
CAP Error Register xC0000000 Uncorrectable ECC err det by MDPB 
MC error info latched (3) 
PCI Bus Trans Error Adr x00000000 
MDPA Status Register x00000000 MDPA Chip Revision x00000000 


MDPA Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 

MDPB Status Register x80000000 MDPB Chip Revision x00000000 
MPDB Error Syndrome of 
uncorrectable read error 

MDPB Error Syndrome Reg x0000004B Cycle 0 ECC Syndrome 

x0000000000004B 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 


PALcode Revision Palcode Rev: 1.21-3 
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5.3.3 MCHK 670 Read Dirty C PU- Detected Failure 
The error log in Example 5-3 shows the following: 
CPUO logged the error in a system with two CPUs. 


The External Interface Status Register records an uncorrectable ECC error 
from the system (bit <30> set). 


Both IOD CAP Error Registers logged an error. 

The MC Error Info Registers 0 and | have captured the error information. 
The commander at the time of the error was CPUO (known from MC_ERR1) 
The command on the bus at the time was a read memory command. 


The address read was a memory address, not an I/O address. 


©eooqoood 8°89 


The data associated with the read was dirty. 


From this information you know CPUO requested data that was dirty; therefore, 
memory did not provide it, nor did an I/O device. Only another CPU could have 
provided the data from its cache. There is only one other CPU in this system, and it 
is faulty. Had there been more than two CPUs you could not have identified the 
error to a particular CPU. See Section 5.4 for a procedure designed to help with 
IOD-detected errors. 


NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 
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Example 5-3 MCHK 670 Read Dirty Failure 


Logging OS 2. DIGITAL UNIX 
System Architecture 2. Alpha 
Event sequence number 4. 
Timestamp of occurrence 08-APR-1996 10:20:37 
Host name sect06 
System type register x00000016 AlphaStation 4x00 
Number of CPUs (mpnum) x00000002 
CPU logging event (mperr) x00000000 1] 
Event validity 1. O/S claims event is valid 
Event severity 1. Severe Priority 
Entry type 100. CPU Machine Check Errors 
CPU Minor class 1. Machine check (670 entry) 
Software Flags x0000000300000000 
IOD 0 Register Subpkt Pres 
IOD 1 Register Subpkt Pres 
Active CPUs x00000003 
Hardware Rev x00000000 
System Serial Number C1563 
Module Serial Number 
Module Type x0000 
System Revision x00000000 
* MCHK 670 Regs * 
Flags: x00000000 
PCI Mask x0000 
Machine Check Reason x0098 Fatal Alpha Chip Detected HardError 
PAL SHADOW REG 0 x0000000000000000 
PAL SHADOW REG 1 x0000000000000000 
PAL SHADOW REG 2 x0000000000000000 
PAL SHADOW REG 3 x0000000000000000 
PAL SHADOW REG 4 x0000000000000000 
PAL SHADOW REG 5 x0000000000000000 
PAL SHADOW REG 6 x0000000000000000 
PAL SHADOW REG 7 x0000000000000000 
PALTEMPO xFFFFFC0O0006C00CO 
PALTEMP1 x00000000000061A8 
PALTEMP2 xFFFFFCO0004E1E00 
PALTEMP22 xFFFFFC00006530E0 
PALTEMP 23 x0000000003D2BA58 
Exception Address Reg xFFFFFC000047395C 
Native-mode Instruction 
Exception PC x3FFFFF000011CE57 
Exception Summary Reg x0000000000000000 
Exception Mask Reg x0000000000000000 
PAL Base Address Reg x0000000000020000 
Base Addr for PALcode: 
x0000000000000008 
Interrupt Summary Reg x0000000000200000 
External HW Interrupt at IPL21 
AST Requests 3-0: 
x0000000000000000 


IBOX Ctrl and Status Reg x000000C160000000 
Timeout Counter Bit Clear. 
IBOX Timeout Counter Enabled. 
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Icache Par Err Stat Reg 
Dcache Par Err Stat Reg 
Virtual Address Reg 

Memory Mgmt Flt Sts Reg 


x0000000000000014 


x000000000000000B 
Scache Address Reg 
Scache Status Reg 
Bcache Tag Address Reg 


x000000000000007E 


Ext Interface Address Reg 


Fill Syndrome Reg 
Ext Interface Status Reg 


LD LOCK 
** TOD SUBPACKET => ** 


WHOAMI 


Base Address of Bridge 
Dev Type & Rev Register 


MC-PCI Command Register 


Floating Point Instructions will 
cause FEN Exceptions. 
PAL Shadow Registers Enabled. 
Correctable Error Interrupts 
Enabled. 
ICACHE BIST 
Successful. 
TEST_STATUS_H Pin Asserted 
x0000000000000000 
x0000000000000000 
x0000000000044000 
x0000000000005D10 
If Err, 
Miss 
Fault Inst RA Field: 


(Self Test) Was 


Reference Resulted in DTB 


Fault Inst Opcode: 


xFFFFFF00000254BF 

x0000000000000000 

xFFFFFF8007EE2FFF 
Last Bcache Access Resulted in a 
Miss. 
Value of Parity Bit for Tag Control 
Status 
Bits Dirty, Shared & Valid is Set. 
Value of Tag Control Dirty Bit is 


Clear. 
Value of Tag Control Shared Bit is 
Clear. 
Value of Tag Control Valid Bit is 
Clear. 


Value of Parity Bit Covering Tag 
Store ddress Bits is Set. 
Tag Address<38:20> Is: 


xFFFFFFOOO7FBFO8F 
x000000000000D189 


xFFFFFFF944FFFFFF e 
Error Source is Memory or System 
UNCORRECTABLE ECC ERROR 
Error Occurred During D-ref Fill 
Error 

XFFFFFFO007FBFOOF 


IOD 0 Register Subpacket 


x000000BA Module Revision 0. 
VCTY ASIC Rev = 0 
Bcache Size = 2MB 
MID 2. 
GID 7. 
x000000F9E0000000 
x06008021 CAP Chip Revision: x00000001 
HORSE Module Revision: x00000002 
SADDLE Module Revision: x00000000 
SADDLE Module Type: LeftHand 
PCI-EISA Bus Bridge Present on PCI 
Segment 
PCI Class Code x00000600 
x06480FF1 Module SelfTest Passed LED on 


Delayed PCI Bus Reads Protocol: 
Enabled 
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Bridge to PCI Transactions: Enabled 
Bridge REQUESTS 64 Bit Data 
Transactions 

Bridge ACCEPTS 64 Bit Data 
Transactions 

PCI Address Parity Check: Enabled 
MC Bus CMD/Addr Parity Check: 
Enabled 

MC Bus NXM Check: Enabled 


Check ALL Transactions for Errors 
Use MC_BMSK for 16 Byte Align Blk 
Mem Wrt 

Wrt PEND_NUM Threshold: 8. 
RD_TYPE Memory Prefetch Algorithm: 
Short 

RL_TYPE Mem Rd Line Prefetch Type: 
Medium 

RM_TYPE Mem Rd Multiple Cmd Type: 
Long 

ARB_MODE Arbitration: MC-PCI 


Priority Mode 
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27> x00000000 
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25> x00000000 
Interrupt Ctrl Register x00000003 Write Device Interrupt Info 

Struct :Enabled 
Interrupt Request x00800000 Interrupts asserted x00000000 
Hard Error 


Interrupt MaskO Register x00C50010 
Interrupt Maskl Register «00000000 
MC Error Info Register 0 x07FBF080 
MC Bus Trans Addr<31:4>: 7FBFO80 


MC Error Info Register 1 x801E8800 MC bus trans addr <39:32> x00000000 


MC Command is Read0-Mem 


oo R~) 


Device ID 2 x00000002 


MC bus error assoc w read/dirty (8) 
MC error info valid 


CAP Error Register xE0000000 Uncorrectable ECC err det by MDPA (3) 
Uncorrectable ECC err det by MDPB 
MC error info latched (4 ) 
Sys Environmental Regs x00000000 
PCI Bus Trans Error Adr x00000000 
MDPA Status Register xC0000000 MDPA Status Register Data Not Valid 


MDPA Error Syndrome Reg x00080089 MDPA Syndrome Register Data Not 
Valid 

MDPB Status Register x80000000 MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg x000D00D1 MDPB Syndrome Register Data Not 
Valid 


** TOD SUBPACKET -> ** IOD 1 Register Subpacket 


WHOAMI x000000BA Module Revision 0. 
VCTY ASIC Rev = 0 
Bcache Size = 2MB 
MID 2. 
GID 7. 


Base Address of Bridge x000000FBE0000000 

Dev Type & Rev Register x06000021 CAP Chip Revision: x00000001 
HORSE Module Revision: x00000002 
SADDLE Module Revision: x00000000 
SADDLE Module Type: LeftHand 
Internal CAP Chip Arbiter: Enabled 
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MC-PCI Command Register x06480FF1 
Mem Wrt 

Mem Host Address Ext Reg x00000000 
IO Host Adr Ext Register x00000000 
Interrupt Ctrl Register x00000003 
Struct :Enabled 

Interrupt Request x00800001 
Interrupt MaskO Register x00C50001 
Interrupt Maskl Register x00000000 
MC Error Info Register 0 x07FBF080 

MC Error Info Register 1 x801E8800 

CAP Error Register xE0000000 

Sys Environmental Regs x00000000 
PCI Bus Trans Error Adr x00000000 
MDPA Status Register xC0000000 
MDPA Error Syndrome Reg x00080089 
Valid 

MDPB Status Register x80000000 
MDPB Error Syndrome Reg x000D00D1 


Valid 


PALcode Revision 


PCI Class Code x00000600 
Module SelfTest Passed LED on 
Delayed PCI Bus Reads Protocol: 
Enabled 

Bridge to PCI Transactions: 
Bridge REQUESTS 64 Bit Data 
Transactions 

Bridge ACCEPTS 64 Bit Data 
Transactions 

PCI Address Parity Check: Enabled 
MC Bus CMD/Addr Parity Check: 
Enabled 

MC Bus NXM Check: Enabled 

heck ALL Transactions for Errors 
se MC_BMSK for 16 Byte Align Blk 


Enabled 


rt PEND_NUM Threshold: 8. 

D_TYPE Memory Prefetch Algorithm: 
hort 

L TYPE Mem Rd Line Prefetch Type: 
edium 

RM_TYPE Mem Rd Multiple Cmd Type: 
Long 

ARB_MODE Arbitration: 
Priority Mode 

HAE Sparse Mem Adr<31:27> x00000000 
PCI Upper Adr Bits<31:25> x00000000 
Write Device Interrupt Info 


SANDS GQ 


MC-PCI 


Interrupts asserted x00000001 


Hard Error 


MC Bus Trans Addr<31:4>: 7FBF080 (7) 
MC bus trans addr <39:32> x00000000 
MC Command is Read0-Mem 6] 
Device ID 2 x00000002 (5) 


MC bus error assoc w read/dirty t 8) 
MC error info valid 


Uncorrectable ECC err det by wopa@ 
Uncorrectable ECC err det by MDPB 


MC error info latched 4) 


MDPA Status Register Data Not Valid 
MDPA Syndrome Register Data Not 
MDPB Status Register Data Not Valid 
MDPB Syndrome Register Data Not 


Palcode Rev: 1.21-3 


ErorLogs 5-25 


5.3.4 MCHK 660 IOD- Detected Failure (System Bus Error) 
The error log in Example 5-4 shows the following: 

CPUO logged the error in a system with two CPUs. 

The External Interface Status Register does not record an error. 

Both IOD CAP Error Registers logged an error. 


The MC Error Info Registers 0 and | captured the error information. 
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The commander at the time of the error was CPU3 (known from 
MC_ERR]1). 


16) The command on the bus at the time was a write-back memory command. 


Since this is an MCHK 660, the IOD detected the error on the bus, and CPUO is 
logging the error. CPUO registers are not important in this case since it is servicing 
the IOD interrupt. There are three devices that can put data on the system bus: 
CPUs, memory, or an JOD. From MC_ERR Register 1 we know that at the time of 
the error CPU3 put bad data on the bus while writing to memory. See Section 5.4 
for a procedure designed to help with IOD-detected errors. 


NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 
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Example 5-4 MCHK 660 IOD- Detected Failure (System Bus Error) 


Logging OS 

System Architecture 
Event sequence number 
Timestamp of occurrence 
Host name 


System type register 
Number of CPUs (mpnum) 


CPU logging event (mperr) 
Event validity 

Event severity 

Entry type 


CPU Minor class 


Software Flags 


Active CPUs 

Hardware Rev 

System Serial Number 
Module Serial Number 
Module Type 

System Revision 


660 Regs * 
Mask 


Check Reason 
PAL SHADOW REG 0 


PAL SHADOW REG 7 
PALTEMPO 


PALTEMP23 


Exception Address Reg 


Exception 
Exception 
PAL BASE 


Summary Reg 
Mask Reg 


x0000000008 
Interrupt Summary Reg 


IBOX Ctrl and Status Reg 


Icache Par Err Stat 
Dcache Par Err Stat 
Virtual Address Reg 
Memory Mgmt Flt Sts 


Reg 
Reg 


Reg 


2. DIGITAL UNIX 
2. Alpha 
6 
04-APR-1996 17:20:04 
whip1l6 
x00000016 AlphaStation 4x00 
x00000002 
x00000000 1) 
1. O/S claims event is valid 
1. Severe Priority 
100. CPU Machine Check Errors 
2. 660 Entry 
x0000000300000000 
IOD 1 Register Subpkt Pres 
IOD 2 Register Subpkt Pres 
x00000003 
x00000000 
C1563 
x0000 
x00000000 
x00000000 
x0000 
x0202 
x00000000 
x00000000 
x0000000007 
x00000000047FDA58 
xFFFFFC000038D784 
Native-mode instruction 
Exception PC x3FFFFFO0000E35E1 
x00000000 
x00000000 
x00000000020000 
Base addr for palcode = 
x00000000200000 
EXT. HW interrupt at IPL21 
AST requests 3 - 0 x00000000 


x000000C160000000 
Timeout Bit Not Set 
PAL Shadow Registers Enabled 
Correctable Err Intrpts Enabled 
ICACHE BIST Successful 
TEST_STATUS_H Pin Asserted 
x00000000 
x00000000 
XFFFFFFFFFF800130 
x00000000014990 
Ref resulted in DTB miss 
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RA Field x0000000006 

Opcode Field x00000000000029 
Scache Address Reg xFFFFFF0000024EAF 
Scache Status Reg x00000000 
Bcache Tag Address Reg XFFFFFF80FFED6FFF 


Parity for ds and v bits 
Cache block dirty 

Cache block valid 

Tag address<38:20> is 


x00000000000FFE 

Ext Interface Address Reg xFFFFFFOOFCOQOQOO0F 
Fill Syndrome Reg x0000000000C5D2 
Ext Interface Status Reg xFFFFFFFOO4FFFFFF 


LD LOCK 


** IOD SUBPACKET -> ** 


Error occurred during D-ref fill (2) 


xFFFFFFO00020065F 


IOD O Register Subpacket 


WHOAMI x000000BA Device ID x0000003A 
Bcache Size = 2MB 
VCTY ASIC Rev = 0 
Module Revision 0. 
Base Address of Bridge x000000F9E0000000 
PCI Revision x06008021 CAP Chip Revision x00000001 
Horse Module Revision x00000002 
Saddle Module Revision x00000000 
Saddle Module Type Left Hand 
EISA Present 
PCI Class Code x00000600 
MC-PCI Command Register x06480FF1 Selftest passed 
Delayed read enabled 
Bridge PCI trans enabled 
Req 64 bit data trans enabled 
Accept 64 bit data trans enabled 
Check PCI Addr Parity enabled 
Check MC bus CMS/Addr Parity 
enabled 
Check MC bus NXM enabled 
Check all transaction enabled 
16 byte aligned block write enabled 
Write Pend Number Thresho x00000008 
RD_TYPE Short 
RL_TYPE Medium 
RM_TYPE Long 
ARB_MODE MC-PCI Bridge Priority 
Mode 
Memory Host Addr Exten x00000000 
IO Host Addr Extension x00000000 
Interrupt Control x00000003 MC-PCI Intr Enabled 
Device intr info enabled if en_int 
= 1 
Interrupt Request x00800000 Interrupts asserted x00000000 
Hard Error 
Interrupt Mask Register 0 x00C50010 
Interrupt Mask Register 1 x00000000 
MC Error Info Register 0 x4A26DBFO MC bus trans addr <31:4> x04A26DBF 
MC Error Info Register 1 x800ED600 MC bus trans addr <39:32> x00000000 


CAP Error Register 


xA0000000 Uncorrectable ECC err det by MDPA 


MC_Command x00000016 


Device Id x0000003B 
MC error info valid 


@o® 60 


MC error info latched 


5-28 AlphaServer 4000/4100 Service Manual 


PCI Bus Trans Error Adr x00000000 

MDPA Status Register x80000000 MDPA Chip Revision x00000000 
MDPA Error Syndrome of 
uncorrectable read error 

MDPA Error Syndrome Reg x1lEQOOO1E Cycle 0 ECC Syndrome 

x0000000000001E 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome 

x0000000000001E 

MDPB Status Register x00000000 MDPB Chip Revision x00000000 

MDPB Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 

** IOD SUBPACKET -> ** IOD 1 Register Subpacket 

WHOAMI x000000BA Device ID x0000003A 
Bcache Size = 2MB 
VCTY ASIC Rev = 0 
Module Revision 0. 

Base Address of Bridge x000000FBE0000000 

PCI Revision x06000021 CAP Chip Revision x00000001 
Horse ModuleRevision x00000002 
Saddle Module Revision x00000000 
Saddle Module Type Left Hand 
PCI Class Code x00000600 

MC-PCI Command Register x06480FF1 Selftest passed 
Delayed read enabled 
Bridge PCI trans enabled 
Req 64 bit data trans enabled 
Accept 64 bit data trans enabled 
Check PCI Addr Parity enabled 
Check MC bus CMS/Addr Parity 
enabled 
Check MC bus NXM enabled 
Check all transaction enabled 
16 byte aligned block write enabled 
Write Pend Number Thresho x00000008 
RD_TYPE Short 
RL_TYPE Medium 
RM_TYPE Long 
ARB_MODE MC-PCI Bridge Priority 
Mode 

Memory Host Addr Exten x00000000 

IO Host Addr Extension x00000000 

Interrupt Control x00000003 MC-PCI Intr Enabled 
Device intr info enabled if en_int 
= 1 

Interrupt Request x00800000 Interrupts asserted x00000000 
Hard Error 

Interrupt Mask Register 0 x00C50001 

Interrupt Mask Register 1 x00000000 

MC Error Info Register 0 x4A26DBFO MC bus trans addr <31:4> x04A26DBF 

MC Error Info Register 1 x800ED600 MC bus trans addr <39:32> x00000000 
MC_Command x00000016 6] 
Device Id x0000003B (5) 
MC error info valid 

CAP Error Register xA0000000 Uncorrectable ECC err det by MDPA (3) 
MC error info latched e 

PCI Bus Trans Error Adr x00000000 

MDPA Status Register x80000000 MDPA Chip Revision x00000000 
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MDPA Error Syndrome of 

uncorrectable read error 
MDPA Error Syndrome Reg x1lEQOOOO1E Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 
MDPB Status Register x00000000 MDPB Chip Revision x00000000 
MDPB Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 
Cycle 1 ECC Syndrome x00000000 
Cycle 2 ECC Syndrome x00000000 
Cycle 3 ECC Syndrome x00000000 
PALcode Revision Palcode Rev: 1.21-3 
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5.3.5 MCHK 660 IOD-Detected Failure (PCI Error) 


The error log in Example 5-5 shows the following: 
CPU 0 logged the error in a system with three CPUs. 


The External Interface Status register records that the error occurred during 
a D-ref Fill but does not indicate what the error is. 


The CAP Error register for IODO did not see an error. 
The CAP Error register for IOD1, however, records a serious error. 
The MC Error Info registers 0 and 1 captured the error information. 


The commander at the time of the error was CPUO and the command was a 
Read-IO (known from MC_ERR1). 


There is a PCI Subpacket from PCI1 with five nodes on it. Three devices 
on the PCI bus did not see an error, however two did, the Mylex DAC960 
and the DEC_KZPSA. Either device could have caused the parity error. 


Since this is an MCHK 660, the IOD-detected the error on the bus, and CPU0O is 
logging the error. CPUO registers are not important in this case since it is servicing 
the IOD interrupt. There are three devices that can put data on the system bus: 

CPUs, memory, or an IOD. The CAP Error register for IOD1 saw a serious error and 
the MC Error Info register captured error information. See Section 5.4 fora 
procedure designed to help with IOD-detected errors. 
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NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 
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Esample 5-5 MCHK 660 IOD-Detected Failure (PCI Error) 


Logging OS 


System 


Event sequence number 
Timestamp of occurrence 
Host name 


System 
Number 


type register x00000016 
of CPUs (mpnum) x00000003 


CPU logging event (mperr) x00000000 


2s 
Architecture £4 
2 


DIGITAL UNIX 
Alpha 


27-AUG-1996 08:15:41 


mason3 


AlphaStation 4x00 


oO 


Event validity 1. O/S claims event is valid 
Event severity 1. Severe Priority 

Entry type 100. CPU Machine Check Errors 
CPU Minor class 2. 660 Entry 

Software Flags x0000002300000000 


IOD 0 Register Subpkt Pres 
IOD 1 Register Subpkt Pres 
PCI 1 Bus Snapshot Present 


(If Cached CPU) 


Active CPUs x00000007 
Hardware Rev x00000000 
System Serial Number NI62503MWE 
Module Serial Number 
Module Type x0000 
System Revision x00000000 
* MCHK 660 Regs * 
Flags x00000000 
PCI Mask x0002 
Machine Check Reason x0202 I0D-Detected Hard Error -OR- 
DTag Parity Error 
PAL SHADOW REG 0 x0000000000000000 
PAL SHADOW REG 1 x0000000000000000 
PAL SHADOW REG 2 x0000000000000000 
PAL SHADOW REG 3 x0000000000000000 
PAL SHADOW REG 4 x0000000000000000 
PAL SHADOW REG 5 x0000000000000000 
PAL SHADOW REG 6 x0000000000000000 
PAL SHADOW REG 7 x0000000000000000 
PALTEMPO XFFFFFFFFB589C000 
PALTEMP1 x0000000000000000 
PALTEMP2 xFFFFFC000043CDAO 
PALTEMP3 x0000000000007C00 
PALTEMP 4 x0000000000000003 
PALTEMP5 x0000000000000000 
PALTEMP 6 x000000000001C6AF 
PALTEMP7 xFFFFFC000043C820 
PALTEMP8 x1F1E171515020100 
PALTEMP9 xFFFFFC000043CB10 
PALTEMP 10 xFFFFFC0000433E0C 
PALTEMP11 xFFFFFC000043C970 
PALTEMP12 xFFFFFC000043CD10 
PALTEMP13 x0000000000026E80 
PALTEMP14 x0000000000000000 
PALTEMP15 x00000000000E0000 
PALTEMP16 x0000020306600001 
PALTEMP17 x0000000000000000 
PALTEMP18 x0000000000000000 
PALTEMP19 XFFFFFFFFB589F958 
PALTEMP 20 x00000000009D2000 
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PALTEMP21 
PALTEMP22 
PALTEMP23 
Exception Address Reg 


Exception Summary Reg 
Exception Mask Reg 
PAL Base Address Reg 


x0000000000000008 
Interrupt Summary Reg 


x0000000000000000 
IBOX Ctrl and Status Reg 


Icache Par Err Stat Reg 
Dcache Par Err Stat Reg 
Virtual Address Reg 

Memory Mgmt Flt Sts Reg 


Scache Address Reg 
Scache Status Reg 
Bcache Tag Address Reg 


Tag Address<38:20> Is: 


xFFFFFC000043CD40 
xFFFFFC000058D540 
x000000007FC67A58 
xFFFFFC0000433E0C 
Native-mode Instruction 
Exception PC x3FFFFFO00010CF83 
x0000000000000000 
x0000000000000000 
x0000000000020000 
Base Addr for PALcode: 


x0000000000200000 
External HW Interrupt at IPL21 
AST Requests 3-0: 


x000000C160000000 
Timeout Counter Bit Clear. 
IBOX Timeout Counter Enabled. 
Floating Point Instructions will Cause 
FEN Exceptions. 
PAL Shadow Registers Enabled. 
Correctable Error Interrupts Enabled. 
ICACHE BIST (Self Test) Was Successful. 
TEST_STATUS_H Pin Asserted 
x0000000000000000 
x0000000000000000 
XFFFFFFFFB6ED3D38 
x0000000000016211 
If Error, Reference Which Caused Was Write 


If Err, Reference Resulted in DTB Miss 
Fault Inst RA Field: x0000000000000008 
Fault Inst Opcode: x000000000000002C 

xFFFFFF000002502F 

x0000000000000000 

XFFFFFF8077AFAFFF 


Last Bcache Access Resulted in a Miss. 
Value of Parity Bit for Tag Control Status 
Bits Dirty, Shared & Valid is Set. 

Value of Tag Control Dirty Bit is Clear. 
Value of Tag Control Shared Bit is Set. 
Value of Tag Control Valid Bit is Set. 
Value of Parity Bit Covering Tag Store 
Address Bits is Set. 


Ext Interface Address Reg xFFFFFFOO7EQQQ00F 


Fill Syndrome Reg 

Ext Interface Status Reg 
LD LOCK 

—> ** 


** IOD SUBPACKET 


WHOAMI 


Base Address of Bridge 
Dev Type & Rev Register 


x000000000000077A 
x000000000000DE08 
xFFFFFFFOO4FFFFFF (2) 
Error Occurred During D-ref Fill 
XFFFFFF0076750B8F 


IOD 0 Register Subpacket 


x000008BA Module Revision 2. 
VCTY ASIC Rev = 0 
Bcache Size = 2MB 
MIDS - 23: 
GID 7. 
x000000F9E0000000 
x06008231 CAP Chip Revision: x00000001 
HORSE Module Revision: x00000003 
SADDLE Module Revision: x00000002 
SADDLE Module Type: Left Hand 


PCI-EISA Bus Bridge Present on PCI Segment 
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MC-PCI Command Register 


Mem Host Address Ext Reg 
IO Host Adr Ext Register 
Interrupt Ctrl Register 
Struct :Enabled 

Interrupt Request 
Interrupt MaskO Register 


Interrupt Maskl Register 
MC Error Info Register 0 


MC Error Info Register 1 


CAP Error Register 

Sys Environmental Regs 
PCI Bus Trans Error Adr 
MDPA Status Register 
MDPA Error Syndrome Reg 
Valid 

MDPB Status Register 
MDPB Error Syndrome Reg 
Valid 


** IOD SUBPACKET -> ** 


WHOAMI 


Base Address of Bridge 
Dev Type & Rev Register 


MC-PCI Command Register 


5-34 


x06470FB1 


Delayed PCI Bus Reads Protocol: 


Bridge 
Bridge 
Bridge 


PCI Class Code x00000600 
Module SelfTest Passed LED on 
Enabled 
to PCI Transactions: Enabled 

WILL NOT REQUEST 64 Bit Data Trans 
ACCEPTS 64 Bit Data Transactions 


PCI Address Parity Check: Enabled 


MC Bus 
MC Bus 


CMD/Addr Parity Check: Enabled 
NXM Check: Enabled 


Check ALL Transactions for Errors 


Use MC_ 


BMSK for 16 Byte Align Blk Mem Wrt 


Wrt PEND_NUM Threshold: 7. 


RD_TYPE Memory Prefetch Algorithm: 


Short 


RL_TYPE Mem Rd Line Prefetch Type: Medium 


RM_TYPE Mem Rd Multiple Cmd Type: 
ARB_MODE Arbitration: 


Long 
MC-PCI Priority Mode 


x00000000 HAE Sparse Mem Adr<31:27> x00000000 
x00000000 PCI Upper Adr Bits<31:25> x00000000 
x00000003 Write Device Interrupt Info 
x00000000 Interrupts asserted x00000000 
x00C50110 
x00000000 3) 
xE0000000 
MC Bus Trans Addr<31:4>: E0000000 
xOO00E89FD MC bus trans addr <39:32> x000000FD 
MC Command is Read1-I0O 
CPUO Master at Time of Error 
Device ID 2 x00000002 
x00000000 
x00000000 
x00000000 
x00000000 MDPA Status Register Data Not Valid 
x00000000 MDPA Syndrome Register Data Not 
x00000000 MDPB Status Register Data Not Valid 
x00000000 MDPB Syndrome Register Data Not 
IOD 1 Register Subpacket 
x000008BA Module Revision 2. 
VCTY ASIC Rev = 0 
Bcache Size = 2MB 
MID 2. 
GID 7. 
x000000FBE0000000 
x06000231 CAP Chip Revision: x00000001 
HORSE Module Revision: x00000003 
SADDLE Module Revision: x00000002 
SADDLE Module Type: Left Hand 
Internal CAP Chip Arbiter: Enabled 
PCI Class Code x00000600 
x06470FB1 Module SelfTest Passed LED on 
Delayed PCI Bus Reads Protocol: Enabled 
Bridge to PCI Transactions: Enabled 
Bridge WILL NOT REQUEST 64 Bit Data Trans 


Bridge 


ACCEPTS 64 Bit Data Transactions 


PCI Address Parity Check: Enabled 


MC Bus 
MC Bus 


CMD/Addr Parity Check: Enabled 
NXM Check: Enabled 


Check ALL Transactions for Errors 
Use MC_BMSK for 16 Byte Align Blk Mem Wrt 
Wrt PEND_NUM Threshold: 7. 
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RD_TYPE Memory Prefetch Algorithm: Short 
RL_TYPE Mem Rd Line Prefetch Type: Medium 
RM_TYPE Mem Rd Multiple Cmd Type: Long 
ARB_MODE Arbitration: MC-PCI Priority Mode 
Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27> x00000000 
IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25> x00000000 
Interrupt Ctrl Register x00000003 Write Device Interrupt Info 
Struct :Enabled 
Interrupt Request x00800000 Interrupts asserted x00000000 
Hard Error 
Interrupt Mask0O Register x00C51111 
Interrupt Maskl Register x00000000 


MC Error Info Register 0 xE0000000 5] 
MC Bus Trans Addr<31:4>: E0000000 
MC Error Info Register 1 x000E89FD MC bus trans addr <39:32> x000000FD 


MC Command is Read1-I0O 6] 
CPUO Master at Time of Error 
Device ID 2 x00000002 


CAP Error Register x00000012 Serious error 4 ) 
PCI error address reg locked 


Sys Environmental Regs x00000000 

PCI Bus Trans Error Adr xCOBB6000 

MDPA Status Register x00000000 MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Reg. Data Not Valid 
MDPB Status Register x00000000 MDPB Status Register Data Not Valid 


MDPB Error Syndrome Reg x00000000 MDPB Syndrome Reg. Data Not Valid 


PALcode Revision Palcode Rev: 1.21-3 

** PCI SUBPACKET -> ** PCI 1 Subpacket 

Node Qty De 

CONFIG Address x000000FBC0000800 

Device and Vendor ID x00011000 NCR 53C810 NCR_810 SCSI Narrow 
SingleEnded 


Vendor ID: x1000 (NCR) 
Device ID: x00000001 


Command Register x0147 I/O Space Accesses Response: Enabled 
Memory Space Accesses Response: Enabled 
PCI Bus Master Capability: Enabled 
Monitor for Special Cycle Ops: DISABLED 


Generate Mem Wrt/Invalidate Cmds: DISABLED 
Parity Error Detection Response: Normal 
Wait Cycle Address/Data Stepping: DISABLED 
SERR# Sys Err Driver Capability: Enabled 
Fast Back-to-Back to Many Target: DISABLED 
Status Register x0200 Device is 33 Mhz Capable. 

No Support for User Defineable Features. 
Fast Back-to-Back to Different Targets, 

Is Not Supported in Target Device. 

Device Select Timing: Medium. 


Revision ID x02 

Device Class Code x010000 Mass Storage: SCSI Bus Controller 
Cache Line S x00 

Latency T. xFF 

Header Type x00 Single Function Device 
Bist x00 

Base Address Register 1 x00101300 

Base Address Register 2 x01119300 

Base Address Register 3 x00000000 

Base Address Register 4 x00000000 

Base Address Register 5 x00000000 
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Base Address Register 6 x00000000 
Expansion Rom Base Addres x00000000 


Interrupt Pl x04 
Interrupt P2 x01 
Min Gnt x00 
Max Lat x00 
CONFIG Address x000000FBC0001000 
Device and Vendor ID x00011069 Mylex DAC960 KZPSC RAID Controller 
Vendor ID: x1069 (Mylex) 
Device ID: x00000001 
Command Register x0147 I/O Space Accesses Response: Enabled 
Memory Space Accesses Response: Enabled 
PCI Bus Master Capability: Enabled 
Monitor for Special Cycle Ops: DISABLED 


Generate Mem Wrt/Invalidate Cmds: DISABLED 
Parity Error Detection Response: Normal 
Wait Cycle Address/Data Stepping: DISABLED 
SERR# Sys Err Driver Capability: Enabled 
Fast Back-to-Back to Many Target: DISABLED 
Status Register xC200 Device is 33 Mhz Capable. 

No Support for User Defineable Features. 
Fast Back-to-Back to Different Targets, 

Is Not Supported in Target Device. 

Device Select Timing: Medium. 

SIGNALED SYSTEM ERROR: This Device has Set 
A System Error on SERR# Line. 

DETECTED PARITY ERROR:This Device Detected 


Revision ID x02 
Device Class Code x010400 
Cache Line S x10 
Latency T. xFF 
Header Type x00 Single Function Device 
Bist x00 
Base Address Register 1 x00101200 
Base Address Register 2 *x01119200 
Base Address Register 3 x00000000 
Base Address Register 4 x00000000 
Base Address Register 5 x00000000 
Base Address Register 6 x00000000 


Expansion Rom Base Addres x01110000 


Interrupt Pl x08 
Interrupt P2 x01 
Min Gnt x04 
Max Lat x00 
CONFIG Address x000000FBC0001800 
Device and Vendor ID x00011000 NCR 53C810 NCR_810 SCSI Narrow 
SingleEnded 
Vendor ID: x1000 (NCR) 
Device ID: x0000000 
Command Register x0147 I/O Space Accesses Response: Enabled 
Memory Space Accesses Response: Enabled 
PCI Bus Master Capability: Enabled 
Monitor for Special Cycle Ops: DISABLED 
Generate Mem Wrt/Invalidate Cmds: DISABLED 
Parity Error Detection Response: Normal 
Wait Cycle Address/Data Stepping: DISABLED 
SERR# Sys Err Driver Capability: Enabled 
Fast Back-to-Back to Many Target: DISABLED 


Status Register x0200 Device is 33 Mhz Capable. 
No Support for User Defineable Features. 
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Revision ID 
Device Class 
Cache Line S 
Latency T. 

Header Type 
Bist 
Base 
Base 
Base 
Base 
Base 
Base 


Address 
Address 
Address 
Address 
Address 
Address 


Expansion Rom Base Addres 


Interrupt Pl 
Interrupt P2 
Min Gnt 
Max Lat 


Code 


Register 
Register 
Register 
Register 
Register 
Register 


CONFIG Address 
Device and Vendor ID 


Ethernet 


Command Register 


Status Register 


Revision ID 
Device Class 
Cache Line S 
Latency T. 

Header Type 
Bist 
Base 
Base 
Base 
Base 
Base 
Base 


Address 
Address 
Address 
Address 
Address 
Address 


Code 


Register 
Register 
Register 
Register 
Register 
Register 


Fast Back-to-Back to Different Targets, 
Is Not Supported in Target Device. 
Device Select Timing: Medium. 
x02 
x010000 
x00 
xFF 
x00 
x00 
x00101100 
x01119100 
x00000000 
x00000000 
x00000000 
x00000000 
x00000000 
x0C 
x01 
x00 
x00 


Mass Storage: SCSI Bus Controller 


Single Function Device 


OOPBWNE 


x000000FBC0002000 
x00091011 DECchip 21140 10/100Mhz TULIP 


Vendor ID: x1011 (Digital Equip Corp) 
Device ID: x00000009 
x0147 I/O Space Accesses Response: Enabled 
Memory Space Accesses Response: Enabled 
PCI Bus Master Capability: Enabled 
Monitor for Special Cycle Ops: DISABLED 
Generate Mem Wrt/Invalidate Cmds: DISABLED 
Parity Error Detection Response: Normal 
Wait Cycle Address/Data Stepping: DISABLED 
SERR# Sys Err Driver Capability: Enabled 
Fast Back-to-Back to Many Target: DISABLED 


x0280 Device is 33 Mhz Capable. 
No Support for User Defineable Features. 
Fast Back-to-Back to Different Targets, 
Is Supported in Target Device. 
Device Select Timing: Medium. 
x12 
Network Controller: 
x00 
xFF 
x00 
x00 
x00101000 
x01119000 
x00000000 
x00000000 
x00000000 
x00000000 


x020000 Ethernet Controller 


Single Function Device 


OOBWNE 


Expansion Rom Base Addres x00000000 


Interrupt Pl 
Interrupt P2 
Min Gnt 
Max Lat 


CONFIG Address 
Device and Vendor ID 


x10 
x01 
x00 
x00 


x000000FBC0002800 
x00081011 DEC_KZPSA Fast-Wide-Differential SCSI 
Vendor ID: x1011 (Digital Equip Corp) 
Device ID: x00000008 
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Command Register 


Status Register 


Revision ID 
Device Class 
Cache Line S 
Latency T. 

Header Type 


Bist 
Base 
Base 
Base 
Base 
Base 
Base 


Address 
Address 
Address 
Address 
Address 
Address 


Code 


Register 
Register 
Register 
Register 
Register 
Register 


x0147 I/O Space Accesses Response: 


Memory Space Accesses Response: 
PCI Bus Master Capability: 
Monitor for Special Cycle Ops: 
Parity Error Detection Response: 


SERR# Sys Err Driver Capability: 


Is Supported in Target Device. 
Device Select Timing: Medium. 


Generate Mem Wrt/Invalidate Cmds: 
Wait Cycle Address/Data Stepping: 


Fast Back-to-Back to Many Target: 
xE2CO Device is 33 Mhz Capable. 

Device Supports User Defineable Features. 

Fast Back-to-Back to Different Targets, 


Enabled 
Enabled 
DISABLED 
DISABLED 
Normal 
DISABLED 
Enabled 
DISABLED 


RECEIVED MASTER-ABORT:Master Sets When Its 


Transaction Terminated by MasterAbort. 
SIGNALED SYSTEM ERROR: This Device has Set 


A System Error on SERR# Line. 


DETECTED PARITY ERROR:This Device Detected 


x00 


x010000 Mass Storage: SCSI Bus Controller 


x10 
xFF 


x00 Single Function Device 


x80 
x01118000 
x00000000 
x00100000 
x01000000 
x00000000 
x00000000 


OOBWNE 


Expansion Rom Base Addres x01100000 
Interrupt Pl 
Interrupt P2 
Min Gnt 
Max Lat 
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x14 
x01 
x08 
x7E 
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Enabled 


5.3.6 MCHK 630 Conectable CPU Enor 


The error log in Example 5-6 shows the following: 


1) CPUO logged the error in a system with two CPUs. 


(2) During a D-ref fill, the External Interface Status Register shows no error 
but states that the “data source is b-cache.” (When a CPU chip does not 
find data it needs to perform a task in any of its caches, it requests data 
from off the chip to fill its D-cache. It performs a D-ref fill.) 


3) Both IOD CAP Error Registers logged no error. 
4) The FIL Syndrome Register has a valid ECC code for the lower half of the 
data. 


Machine check 630s are detected by CPUs when they either take data off the system 
bus or when they access their own B-cache. In this case, the data did not come from 
the system bus, otherwise bit <30> would be set in the External Interface Status 
Register. CPUO had a single-bit, ECC correctable error. 


NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 
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Example 5-6 MCHK 630 Corectable CPU Enor 


Logging OS 2. DIGITAL UNIX 
System Architecture 2. Alpha 
Event sequence number 415. 
Timestamp of occurrence O9-MAY-1996 14:56:30 
Host name whip16 
System type register x00000016 AlphaStation 4x00 
Number of CPUs (mpnum) x00000002 
CPU logging event (mperr) x00000000 1] 
Event validity 1. O/S claims event is valid 
Event severity 3. High Priority 
Entry type 100. CPU Machine Check Errors 
CPU Minor class 3. Becache error (630 entry) 
Software Flags x00000000 
Active CPUs x00000003 
Hardware Rev x00000000 
System Serial Number C1563 
Module Serial Number 
Module Type x0000 
System Revision x00000000 
Machine Check Reason x0086 Alpha Chip Detected ECC Err, From 
B-Cache 
EI STAT XFFFFFFFOO4FFFFFFE 
DATA SOURCE IS BCACHE (2) 
D-ref fill 
EV5 Chip Rev 4 
EI ADDRESS xFFFFFFO00138D85EF 
FIL SYNDROME x00000000000800 4) 
ISR x0000000100200000 
WHOAMI x00000000 Module Revision 0. 
MID 0. 
GID 0. 
Sys Environmental Regs x00000000 
Base Addr of Bridge x00000000 
Dev Type & Rev Register x00000000 CAP Chip Revision: x00000000 
Horse Module Revision: x00000000 
Saddle Module Revision: x00000000 
Saddle Module Type: LeftHand 
Internal CAP Chip Arbiter: Enabled 
PCI Class Code x00000000 
MC Error Info Register 0 x00000000 
MC Bus Trans Addr<31:4>: 0 
MC Error Info Register 1 «00000000 MC bus trans addr <39:32> x00000000 
MC Command is Illegal 
Illegal 
Device ID 2 x00000000 
CAP Error Register x00000000 3] 
MDPA Status Register x00000000 MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not Valid 
MDPB Status Register x00000000 MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not Valid 


PALcode Revision 


Palcode Rev: 1.21-3 
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5.3.7 MCHK 620 Conectable Enor 

The MCHK 620 error is a correctable error detected by the IOD. 

The error log in Example 5-7 shows the following: 

CPUO logged the error in a system with two CPUs. 

The External Interface Status Register is not valid. 

The MC Error Info Registers 0 and | captured the error information. 


The commander at the time of the error was CPUO. 


6ooe 08 86 


The command at the time of the error was a write-back memory command. 


The IOD detected a recoverable error on the system bus. The MC command at the 
time of the error is a WriteThru-Mem Command (x00000006). The system bus 
commander at the time of the error is CPUO. Since this is a write, the defective FRU 
is CPUO. 


NOTE: The error log example has been edited to decrease its size; registers of 
interest are in bold type. The “Horse” module referred to in the error log is the 
system bus to PCI bus bridge module, the B3040 module. The “Saddle” module is 
the PCI motherboard, the B3050 module. The “MC” bus is the system bus. 


Refer to Table 5-9 for information on decoding commands, and refer to Table 5-10 
for information on node IDs. 


Example 5-7 MCHK 620 Conectable Enor 


Logging OS 2. DIGITAL UNIX 

System Architecture 2. Alpha 

Event sequence number 32% 

Timestamp of occurrence 28-JUN-1996 19:45:42 

Host name sect06 

System type register x00000016 AlphaStation 4x00 

Number of CPUs (mpnum) x00000002 

CPU logging event (mperr) x00000000 1) 
Event validity 1. O/S claims event is valid 
Event severity 5. Low Priority 

Entry type 100. CPU Machine Check Errors 
CPU Minor class 4. 620 System Correctable Error 
Software Flags x0000000000000000 

Active CPUs x00000003 

Hardware Rev x00000000 

System Serial Number C1563 

Module Serial Number 

Module Type x0000 
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System Revision x00000000 


Machine Check Reason x0204 IOD Detected Soft Error 
Ext Interface Status Reg x0000000000000000 


Not Valid for 620 System (2) 
Correctable Errors 

Ext Interface Address Reg x0000000000000000 
Not Valid for 620 System 
Correctable Errors 

Fill Syndrome Reg x0000000000000000 
Not Valid for 620 System 
Correctable Errors 

Interrupt Summary Reg x0000000000000000 
Not Valid for 620 System 
Correctable Errors 


WHOAMI x00000000 Module Revision 0. 
MID 0. 
GID 0. 

Sys Environmental Regs x00000000 

Base Addr of Bridge x000000FBE0000000 

Dev Type & Rev Register x06000032 CAP Chip Revision: x00000002 
HORSE Module Revision: x00000003 
SADDLE Module Revision: x00000000 
SADDLE Module Type: LeftHand 
Internal CAP Chip Arbiter: Enabled 
PCI Class Code x00000600 


MC Error Info Register 0 x122D5640 
MC Bus Trans Addr<31:4>: 122D5640 
MC Error Info Register 1 x800E9600 MC bus trans addr <39:32> x00000000 


MC Command is WriteBack Mem (5) 
CPUO Master at Time of Error 


Device ID 2 00000002 4 ) 
MC error info valid 
CAP Error Register x89000000 Error Detected but Not Logged 


Correctable ECC err det by MDPA 3] 
MC error info latched 


MDPA Status Register x00000000 MDPA Status Register Data Not Valid 
MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not Valid 

MDPB Status Register x00000000 MDPB Status Register Data Not Valid 
MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not Valid 
PALcode Revision Palcode Rev: 0.0-1 
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5.4 Troubleshooting IOD-Detected Enors 


Step 1 


Read the CAP Error Registers on both PCI bridges (F9EO000880 and FBE0000880). 
If one or both of these registers shows an error, match the register contents with the 
data pattern and perform the action indicated. 


Table 5-3 CAP Enor Register Data Pattem 


Action 


Data Pattem Most Likely Cause 

110x x00x x000 0000 0000 0000 O00x xxxx RDSB - Uncorrectable ECC Go to Step 2 
error detected on upper QW of 
MC bus (D127:64>) 

101x x00x x000 0000 0000 0000 O00x xxxx RDSA - Uncorrectable ECC Go to Step 2 
error detected on lower QW of 
MC bus (D63:0>) 

111x x00x x000 0000.0000 0000 000x xxxx RDS detected in both QWs Go tto Step 2 

1001 1000x000 0000 0000 0000 O00x xxxx CRDB - Correctable ECC error = Go to Step 2 
detected on upper QW of MC 
bus (D127:64>) 

1000 0000 x000 0000 0000 0000 OO0x xxxx CRDA - Correctable ECC error = Go to Step 2 
detected on lower QW of MC 
bus (D63:0>) 

1001 1000 x000 0000.0000 0000 000x xxxx  CRDdetected in both QWs. 

100x x10x x000 0000 0000 0000 O00x xxxx NXM - Nonexistent MC bus Go to Step 3 
address 

100x x01x x000 0000.0000 0000 000x xxxx = MC_ADR_PERR - MC bus Go to Step 4 
address parity error 

100x x00x 1000 0000.0000 0000 000x xxxx  PIO_OVFL - PIO buffer Go to Step 5 
overflow 

0000 0000.0000 0000 0000 0000 0001 Ixxx PTE_INV - Page table entry is Go to Step 6 
invalid 

0000 0000 0000 0000.0000 0000 0001 xixx = MAB - Master abort Go to Step 7 

0000 .0000.0000 0000 0000 0000 0001 xx Lx SERR - PCI system error Go to Step 8 

0000 0000 0000 0000 0000 0000 0001 xxx1 Go to Step 9 


PERR - PCI parity error 
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5.4.1 System Bus ECC Enor 


Step 2 


Read the MC_ERRI register and match the contents with the data pattern. Perform 


the action indicated. 


Table 5-4 System Bus ECC Enor Data Pattem 


MC_ERR1 Data Pattem 


Most Likely Cause 


Action 


for Memory Read 
1000 0000 0000 xxxx Xxxx 1Oxx OXxx XXXX 


1000 0000 0000 xxxx xXxxx 11 1x Oxxx Xxxx 
1000 0000 OOO] xxxx Xxxx 1Oxx OxXxx xxx 


1000 0000 OOO] xxxx Xxxx 11 1x Oxxx xxxx 


for Memory orl/O Write 
1000 0000 O00x xxx0 10xx 01 1x xxxx Xxxx 


1000 0000 000x xxx0 11xx O1 Lx xxxx xxxx 
1000 0000 000x xxx1 OOxx O1 Lx xxxx Xxxx 
1000 0000 000x xxx1 O1xx O1 Lx xxxx xXxxx 
1000 0000 000x xxx1 10xx O1 Lx xxxx xxxx 


1000 0000 OOOx xxx] 11xx O11x Xxxx Xxxx 


for Memory Fill Transactions 
1000. 0000 000x xxx1 OOxx 110x xxxx xxxx 


1000 0000 OOOx xxx1 O1xx 110x xxxx Xxxx 
1000 0000 OOOx xxx1 10xx 110x xxxx Xxxx 
1000 0000 OOOx xxx] 11xx 110x xxxx Xxxx 


Bad nondirty data from 
memory (bad memory) 


Bad nondirty data from 
memory (bad memory) 


Bad dirty data from a 
CPU 


Bad dirty data from a 
CPU 


Bad data from MID = 2 
Bad data from MID = 3 
Bad data from MID = 4 
Bad data from MID = 5 
Bad data from MID = 6 


Bad data from MID = 7 


Bad data from MID = 4 
Bad data from MID = 5 
Bad data from MID = 6 
Bad data from MID = 7 


Go to Step 10 


Go to Step 10 


Replace CPU(s) 


Replace CPU(s) 


Replace CPUO 
Replace CPU1 
Replace IODO 
Replace IODO 


Replace CPU2 
or IOD1 


Replace CPU3 
or IOD1 


Replace IODO 
Replace IODO 
Replace IOD1 
Replace IOD1 


NOTE: IODO = B3040-AA bridge module; IOD1 = B3040-AB bridge module. 
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5.4.2 System Bus Nonexistent Address Error 
Step 3 


Determine which node (if any) should have responded to the command/address 
identified in MC_ERR1. Perform the action indicated. 


Table 5-5 System Bus Nonexistent Address Enor Troubleshooting 


MC_ERRI1 Data Pattem Most Likely Cause Action 


1000.0000 O00x xxxx XXxX XxXxx Oxxx xxxx Software generated an MC Fix software 
ADDR > TOP_OF_MEM 


reg 


1000.0000 0000 xxxx Xxxx Xxxx Ixxx 100x PCIO bridge did not Replace IODO 
respond 


1000.0000 0000 xxxx Xxxx Xxxx Ixxx 1Olx PCI1 bridge did not Replace IODO 
respond 


1000.0000 0000 xxxx Xxxx Xxxx Ixxx 110x PCI2 bridge did not Replace IOD1 
respond 


1000.0000 0000 xxxx Xxxx Xxxx Ixxx l11x PCI3 bridge did not Replace IOD1 
respond 


NOTE: IODO = B3040-AA bridge module; IOD1 = B3040-AB bridge module. 
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5.4.3 System Bus Address Parity Error 


Step 4 


Determine which node put the bad command/adress on the system bus identified in 
MC_ERR1. Perform the action indicated. 


Table 5-6 Address Parity Enor Troubleshooting 


MC_ERRI1 Data Pattem Most Likely Cause Action 


10000000 00x xxx0 10xx XXxxXXXxxxxxx Data sourced by MID=2 Replace CPUO 
1000.0000 000x xxx0 11xx Xxxx Xxxx xxxx Data sourced by MID = 3 Replace CPU1 
1000.0000 000x xxx1 OOxx Xxxx Xxxx xxxx Data sourced by MID=4 _ Replace IODO 
1000.0000 000x xxx1 O1xx Xxxx Xxxx xxxx Data sourced by MID = 5 Replace IODO 


1000.0000 000x xxx1 1Oxx Xxxx Xxxx xxxx Data sourced by MID=6 Replace CPU2 
or IOD1 


1000.0000 000x xxx1 11xx Xxxx xxxx xxxx Data sourced by MID=7 Replace CPU3 
or IOD1 


NOTE: IODO = B3040-AA bridge module; IOD1 = B3040-AB bridge module. 
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5.4.4 PIO Buffer Overflow Enor (PIO_OVFL) 


Step 5 


Enter the value of the CAP_CTRL register bits<19:16> (Actual_PEND_NUM) in the 
following formula. Compare the results as indicated in Table 5-7 to determine the 
most likely cause of the error. When an IOD is implicated in the analysis of the 

error, replace the one that capturered the error in its CAP Error Register. 


Expected_PEND_NUM = 12 - (2 * (K- 1)) + Y) 
Where: X = Number of PCIs 
Y = Number of CPUs 


Table 5-7 Cause of PO_OVFLEnor 


Comparison Most Likely Cause Action 
Actual_PEND_NUM = Broken hardware on IOD Replace IOD 
Expected_PEND_NUM 

Actual_PEND_NUM < Broken hardware on IOD Replace IOD 
Expected_PEND_NUM 

Actual_PEND_NUM > PEND_NUM setup incorrect Fix the software 


Expected_PEND_NUM 


NOTE: IODO = B3040-AA bridge module; IOD1 = B3040-AB bridge module. 
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5.4.5 Page Table Entry Invalid Error 


Step 6 


This error is almost always a software problem. However, if the software is known 
to be good and the hardware is suspected, swap the IOD. 


5.4.6 PCI Master Abort 


Step 7 


Master aborts normally occur when the operating system is sizing the PCI bus. 
However, if the master abort occurs after the system is booted, read PCI_LERR1 and 
determine which PCI device should have responded to this PCI address. Replace 
this device. 


5.4.7 PCI System Enor 


Step 8 


For this error to occur a PCI device asserted SERR. Read the error registers in all 
the PCI devices to determine which device. The PCI device that set SERR should 
have information logged in its error registers that should indicate a device. 


5.4.8 PCI Parity Enor 


Step 9 


Read PCI_ERR1 and determine which PCI device normally uses that PCI address 
space. Replace that device. Also, read the error registers in all the PCI devices to 
determine which device was driving the PCI bus when the parity error occurred. 
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5.4.9 Broken Memory 


Step 10 


Refer to the following sections. 


For a Read Data Substitute Error (uncorrectable ECC error) 


When a read data substitute (RDS) error occurs, determine which memory module 
pair caused the error as follows: 


A 


Run the memory diagnostic to see if it catches the bad memory. If so, replace 
the memory module that it reports as bad. 


At the SRM console prompt, enter the show mem command. 
POO>>> show mem 


This command displays the base address and size of the memory module pair for 


each slot. 
OR 


Read the configuration packet, found in the error log, to retrieve the base 
address and size of the memory module pair. 


Compare this address to the failing address from the MC_ERR1I and MC_ERRO 
Registers to determine which memory slot is failing. 


Replace both memory modules (high and low) for that slot. For an RDS error, 
there is no way to know which memory module (high or low) is bad. 


For a Corrected Read Data Error (CRD) 


When a CRD error occurs, determine which memory module pair caused the error as 
follows: 


iE 


At the SRM console prompt, enter the show mem command. This command 
displays the base address and size of the memory module pair for each slot. 


POO>>> show mem 


Compare this address to the failing address from the MC_ERR1I and MC_ERRO 
Registers to determine which memory slot is failing. 
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3. When you have isolated the failing memory pair, determine which of the two 
modules is bad. (You cannot do this if the operating system is Windows NT.) 
Read the CPU FIL SYNDROME Register. If this register is non-zero, use the 
ECC syndrome bits in Table 5-8 to determine which module had the single-bit 
error. 


Table 5-8 ECC Syndrome Bits Table 


MDP Syndrome Values for Low-Order Memory 


01 | 02 04 08 10 20 40 CE 
CB D3 D5 D6 D9 DA DC 25 
26 29 2C 31 13 19 4F 52 
54 57 58 5B 5D A2 A4 BO 
| MDP Syndrome Values for High-Order Memory 
2A | 34 | OE | OB | 15 | 16 | 1A | | E3 
E56 E9 EA EC Fl F4 AB 
AD B5 8F 8A 92 94 97 9B 
9D 6B 6D 75 


62 


64 


67 


68 
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5.4.10 Command Codes 


Table 5-9 shows the codes for transactions on the system bus and how they are 
affected by the commander in charge of the bus during the transaction. The 
command is a six-bit field in the command address (bits<5:0>). Bit-to-text 
translations give six-bit data (although the top two bits may or may not be relevant). 
Note that address bit<39> defines the command as being either a system space or an 


1/O command. 


Table 5-9 Decoding Commands 


MC_C CMD MC_ No B- 
MD in ADR Cache Cache 

54 3210 #£4Hex <39> Description CPU CPU lIoD 

xx 0000 XO 1 "Mem Idle “Y “Y 

00 0010 02 1 Write Pend Ack Y 

XX OOll x3 1 Mem Refresh 

XX O101 xX4 0 Set Dirty Y 

x0 0110 0/26 0 Write Thru- Mem Y Y 

x0 0110 0/26 1 Write Thru - I/O Y Y 

xl 0110 3/16 0 Write Back-Mem Y Y 

x1 0110 3/16 1 Write Intr - I/O Y 

00 O1ll1l O7 0 Write Full - Mem Y 

10 O1l11 27 0 Write Part - Mem Y 
(B-cache CPU 
only) 

x0 O111 0/27 1 Write Mask - I/O Y 

x0 Oll1 O27 O Write Merge - Y 
Mem 

XX 1000 X8 0 Read0O - Mem Y Y Y 

XX 1000 X8 1 ReadoO - I/O Y Y 

XX 1001 x9 0 Read1 - Mem Y Y Y 

XX 1001 xX9 1 Read1 - I/O Y Y 

XX 1010 XA 0 Read ModO - Y Y Y 
Mem 
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Table 5-9 Decoding Commands (continued) 


MCC CMD MC _~ NoB B 


MD in ADR Cache Cache 
54 3210 Hex <39> Description CPU CPU loD 

xx 1010 XA 0 ~ Read Modo - “Y “Y “Y 
Mem 

XX 1010 XA 1 Read PeerO - I/O Y 

XX 1011 XB 0 Read Mod1 - Y Y Y 
Mem 

XxX 1011 XB 1 Read Peer! - I/O Y 

10 1100 2C 1 FILLO (due to Y 
ReadO/Peer0) 

10 1101 2D 1 FILL 1 (due to Y 
Read1/Peer1) 

XX 1110 XE 0 Read0O - Mem Y Y 


XxX 1111 XF 0 Read1 - Mem Y Y 


5.4.11 Node IDs 


The node ID is a six-bit field in the command address (bits<38:33>). The high-order 
three bits are always set, and the last three indicate the node. Bit-to-text translations 
give six-bit data, although only the last three bits define the node. 


Table 5-10 Node IDs 


Node ID <2:0> Six Bit(Hex) Node (4000) Node (4100) 


000 38 

001 39 Memory Memory 
010 3A CPUO CPUO 
011 3B CPU1 CPUI1 
100 3C IODO IODO 
101 3D IOD1 IOD1 
110 3E IOD2 CPU2 


111 3F IOD3 CPU3 
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5.5 Double Enor Halts and Machine Checks While 
in PALMode 


Two error cases require special attention. Neither double error halts or 
machine checks while the machine is in PAL mode result in error log entries. 
Nevertheless, information is available that can help determine what error 
occurred. 


5.5.1 PALcode Overview 


PALcode, privileged architecture library code, is used to implement a number of 
functions at the machine level without the use of microcode. This allows operating 
systems to make common calls to PALcode routines without knowing the hardware 
specifics of each system the operating system is running on. PALcode routines 
handle: 


e Instructions that require complex sequencing, such as atomic operations 
e Instructions that require VAX-style interlocked memory access 

e Privileged instructions 

e Memory management 

e Context swapping 

e Interrupt and exception dispatching 

e =Power-up initialization and booting 

e Console functions 


e Emulation of instructions with no hardware support 
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5.5.2 Double Error Halt 


A double error halt occurs under the following conditions: 
e A machine check occurs. 


e PAL completes its tasks and returns control of the system to the operating 
system. 


e Asecond machine check occurs before the operating system completes its tasks. 


The machine returns to the console and displays the following message: 


halt code = 6 

double error halt 

PC = 20000004 

Your system has halted due to an irrecoverable 
error. Record the error halt code and PC and 
contact your Digital Services representative. In 
addition, type INFO 5 and INFO 8 at the console and 
record the results. 


The info 5 command (Example 5-9) causes the SRM console to read the PAL-built 
logout area that contains all the data used by the operating system to create the error 
entry. 


The info 8 command (Example 5-10) causes the SRM console to read the IOD 0 and 
IOD 1 registers. 


5.5.3 Machine Checks While in PAL 


If a machine check occurs while the system is running PALcode, PALcode returns 
to the SRM console, not to the operating system. The SRM console writes the 
following message: 


halt code = 7 

machine check while in PAL mode 

PC = 20000004 

Your system has halted due to an irrecoverable 
error. Record the error halt code and PC and 
contact your Digital Services representative. In 
addition, type INFO 3 and INFO 8 at the console and 
record the results. 


The info 3 command (Example 5-8) causes the SRM console to read the “impure 
area,” which contains the state of the CPU before it entered PAL. 


Example 5-8 INFO 3 Command 


POO>>> info 3 
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cpu00 


per_cpu impure area 00004400 

cns$flag 00000001 : 0000 
cns$flagt4 00000000 : 0004 
ens$hlt 00000000 : 0008 
cns$hlt+4 00000000 : 000c 
cns$mchkflag 00000228 : 0210 
cns$mchkflag+4 00000000 : 0214 
cns$exc_addr 20000004 : 0318 
cns$exc_addr+4 00000000 : O031c 
cns$pal_base 00000000 : 0320 
cns$pal_base+4 00000000 : 0324 
cns$mm_stat 0000da10 : 0338 
cns$mm_stat+4 00000000 : 033c 
cns$va 00080000 : 0340 
cns$vat4 00000002 : 0344 
cnsS$icsr 40000000 : 0348 
cnsSicsrt+4 000000c1 : 034c 
cns$ipl 0000001£f : 0350 
cns$ipl+4 00000000 : 0354 
cns$ps 00000000 : 0358 
cnsSpst4 00000000 : 035c 
cns$itb_asn 00000000 : 0360 
cns$itb_asn+4 00000000 : 0364 
cnsSaster 00000000 : 0368 
cnsSaster+4 00000000 : 036c 
cnsSastrr 00000000 : 0370 
cnsSastrr+4 00000000 : 0374 
cnsS$isr 00400000 : 0378 
cnsSisrt+4 00000000 : 037c 
cnsSivptbr 00000000 : 0380 
cnsSivptbrt+4 00000002 : 0384 
cns$mcsr 00000000 : 0388 
cns$mcesrt+4 00000000 : 038c 
cns$dc_mode 00000001 : 0390 
cns$dc_mode+4 00000000 : 0394 
cns$maf_mode 00000080 : 0398 
cns$maf_mode+4 00000000 : 039c 
cns$sirr 00000000 : 03a0 
cns$sirrt+4 00000000 : 03a4 
cnsS$fpcsr 00000000 : 03a8 
cnsS$fpcsrt+4 ££900000 : O3ac 
cnsS$icperr_stat 00000000 : 03b0 
cnsS$icperr_statt4 00000000 : 03b4 
cnsSpmctr 00000000 : 03b8 
cns$pmctr+4 00000000 : O3bc 
cns$exc_sum 00000000 : 03c0 
cns$exc_sum+4 00000000 : 03c4 
cns$exc_mask 00000000 : 03c8 
cnsSexc_mask+4 00000000 : 03cc 
cns$intid 00000016 : 03d0 
cns$intid+4 00000000 : 03d4 
cnsSdcperr_stat 00000000 : 03d8 
cns$dcperr_statt4 00000000 : O03dc 
cns$sc_stat 00000000 : 03e0 
cns$sc_stat+4 00000000 : 03e4 
cns$sc_addr 000047c£ : 03e8 
cns$sc_addr+4 fffffFfO00 : O3ec 
cnssse_ctl O000f000 : O3f0 
cns$sc_ctl1+4 00000000 : O03f4 
ens$bc_tag_addr ff7fefff : O3f8 
cens$bc_tag_addr+4 EEEEREEL. 3: OSES 
ensS$ei_stat O4fffLLEF : 0400 
ensS$ei_stat+4 fLEFFLLO : 0404 
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ens$fill_syn 
ens$fill_syn+4 
cens$ld_lock 
cens$ld_lock+4 


000000a7 
00000000 
0004eaef 
ffffffoo 


0410 
0414 
0418 
O041c 
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Example 5-9 INFO 5 Command 


POO>>> info 5 


cpu00d 

per_cpu logout area 00004838 

mchk$crd_flag 00000320 : 0000 
mchk$crd_flag+4 00000000 : 0004 
mchk$crd_offsets 00000118 : 0008 
mchk$crd_offsets+4 00001328 : 000c 
mchk$crd_mchk_code 00980000 : 0010 
mchk$crd_mchk_code+4 00000000 : 0014 
mchk$crd_ei_stat eba00003 : 0018 
mchk$crd_ei_stat+4 4143040a : O001c 
mchk$crd_ei_addr d1i200067 : 0020 
mchk$crd_ei_addr+4 47£90416 : 0024 
mchk$crd_fill_syn eba00003 : 0028 
mchk$crd_fill_synt4 d1200068 : 002c 
mchk$crd_isr Tec38000 : 0030 
mchk$crd_isr+4 63£f£4000 : 0034 
mchk$flag 00000320 : 0000 
mchk$flagt+4 00000000 : 0004 
mchk$isr 00000000 : 0138 
mchkS$isr+4 00000000 : 013c 
mchkSicsr 60000000 : 0140 
mchk$icsr+4 000000c1 : 0144 
mchk$ic_perr_stat 00000000 : 0148 
mchk$ic_perr_stat+4 00000000 : 014c 
mchk$dc_perr_stat 00000000 : 0150 
mchk$dc_perr_stat+4 00000000 : 0154 
mchk$va ££8000a0 0158 
mchk$vat4 ffffffftt O15c¢ 
mchk$mm_stat 000149d0 0160 
mchk$mm_stat+4 00000000 0164 
mchk$sc_addr 0001904£ 0168 
mchk$sc_addr+4 ffffff00 016c 
mchk$sc_stat 00000000 0170 
mchk$sc_stat+4 00000000 0174 
mchk$bc_tag_addr ff7feffft 0178 
mchk$bc_tag_addr+4 fFETEETEE O17c 
mchkSei_addr O66bc3ef : 0180 
mchk$ei_addr+4 f£ffff0O : 0184 
mchk$fill_syn 000000a7 : 0188 
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mchk$fill_syn+4 


mchkSei_stat 


mchkSei_stat+4 


mchk$ld_lock 


mchk$ld_lock+4 


IOD: 0 base address: 


WHOAMI : 
CAP_CTL: 
INT_CTL: 
INT_MASK1: 
CAP_ERR: 
MDPA_SYN: 


0000003a 
02490fb1 
00000003 
00000000 
84000000 
00000000 


IOD: 1 base address: 


WHOAMI : 
CAP_CTL: 
INT_CTL: 
INT_MASK1: 
CAP_ERR: 
MDPA_SYN: 


0000003a 
02490fb1 
00000003 
00000000 
84000000 
00000000 


£9e0000000 


PCI_REV: 
HAE_MEM: 
INT_REQ: 
MC_ERRO: 
PCI_ERR: 
MDPB_STAT: 


fbe0000000 


PCI_REV: 
HAE_MEM: 
INT_REQ: 
MC_ERRO: 
PCI_ERR: 
MDPB_STAT: 


00000000 
O4ffffff 
ffffrfffo 
O0005b6f 
ffffff00 


06008221 
00000000 
00800000 
e0000000 
00000000 
00000000 


06000221 
00000000 
00800000 
e0000000 
00000000 
00000000 


018c 

0190 

0194 

0198 

019c 

HAE_IO: 00000000 
INT_MASKO: 00010000 
MC_ERR1: 800e88fd 
MDPA_STAT: 00000000 
MDPB_SYN: 00000000 
HAE_IO: 00000000 
INT_MASKO: 00010000 
MC_ERR1: 800e88fd 
MDPA_STAT: 00000000 
MDPB_SYN: 00000000 
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Example 5-10 INFO 8 Command 


PO0O>>> info 8 


IOD 0 


WHOAMI : 
CAP_CTL: 
INT_CTL: 


INT_MASK1: 


CAP_ERR: 
MDPA_SYN: 
INT_TARG: 


PERF_MON: 


DIAG_CHKA: 


WO_BASE: 
W1_BASE: 
W2_BASE: 
W3_BASE: 
W_DAC: 


IoD 1 


WHOAMI : 


CAP_CTL: 
INT_CTL: 


INT_MASK1: 


CAP_ERR: 
MDPA_SYN: 
INT_TARG: 


PERF_MON: 


DIAG_CHKA: 


WO_BASE: 
W1_BASE: 
W2_BASE: 
W3_BASE: 
W_DAC: 
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0000003a 
02490fb1 
00000003 
00000000 
00000000 
00000000 
0000003a 


00406ebf 
10000000 
00100001 
00800001 
8000000 

00000000 
00000000 


0000003a 
02490fb1 
00000003 
00000000 
00000000 
00000000 
0000003a 


004e31a6 
10000000 
00100001 
00800001 
80000001 
00000000 
00000000 


PCI_REV: 
HAE_MEM: 
INT_REQ: 
MC_ERRO: 
PCI_ERR: 


MDPB_STAT: 


INT_ADR: 


PERF_CONT: 
DIAG_CHKB: 


WO_MASK: 
W1_MASK: 
W2_MASK: 
W3_MASK: 
SG_TBIA: 


PCI_REV: 
HAE_MEM: 
INT_REQ: 
MC_ERRO: 
PCI_ERR: 


MDPB_STAT: 


INT_ADR: 


PERF_CONT: 
DIAG_CHKB: 


WO_MASK: 
W1_MASK: 
W2_MASK: 
W3_MASK: 
SG_TBIA: 


06008221 
00000000 
00000000 
e0000000 
00000000 
00000000 
00006000 


00000000 
10000000 
00000000 
00700000 
3££00000 
1££00000 
00000000 


06000221 
00000000 
00000000 
e0000000 
00000000 
00000000 
00006000 


00000000 
10000000 
00000000 
00700000 
3££00000 
1££00000 
00000000 


HAE_IO: 
INT_MASKO: 
MC_ERR1: 
MDPA_STAT: 
MDPB_SYN: 
INT_ADR_EXT 


CAP_DIAG: 
SCRATCH: 
TO_BASE: 
T1_BASE: 
T2_BASE: 
T3_BASE: 
HBASE: 


HAE_IO: 
INT_MASKO: 
MC_ERR1: 
MDPA_STAT: 
MDPB_SYN: 
INT_ADR_EXT 


CAP_DIAG: 
SCRATCH: 
TO_BASE: 
T1_BASE: 
T2_BASE: 
T3_BASE: 
HBASE: 


00000000 
00210000 
000e88fd 
00000000 
00000000 
00000000 


00000000 
21011131 
00001000 
00008000 
00000000 
0000b800 
00000000 


00000000 
00000000 
000e88fd 
00000000 
00000000 
00000000 


00000000 
00000000 
00001000 
00008000 
00000000 
0000a000 
00000000 


Chapter 6 
Error Registers 


This chapter describes the registers used to hold error information. These registers 
include: 


External Interface Status Register 
External Interface Address Register 
MC Error Information Register 0 
MC Error Information Register 1 
CAP Error Register 

PCI Error Status Register 1 
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6.1 Extemal Interface Status Register - EL STAT 


The EI_STAT register is a read-only register that is unlocked and cleared by 
any PALcode read. A read of this register also unlocks the EI_ADDR, 
BC_TAG_ADDR, and FILL_SYN registers subject to some restrictions. The 
EI_STAT register is not unlocked or cleared by reset. 


Address FF FFFO 0168 
Type R 


| 3130 29 2807 oa 23 ol 


CHIP_ID <3:0> 
BC_TPERR 
BC_TC_PERR 
EES 
COR_ECC_ERR 


| 61 | | | | | | 36135 34 33 2a 


All ts a 


SEO_HRD_ERR 
FIL_IRD 
El_PAR_ERR 
UNC_ECC_ERR 


PKW0453-96 
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Fill data from B-cache or main memory could have correctable or uncorrectable 
errors in ECC mode. In parity mode, fill data parity errors are treated as 
uncorrectable hard errors. System address/command parity errors are always treated 
as uncorrectable hard errors, irrespective of the mode. The sequence for reading, 
unlocking, and clearing EI_STAT, EI ADDR, BC_TAG_ADDR, and FILL_SYN is 
as follows: 

1. Read the EI ADDR, BC_TAG_ADDR, and FIL_SYN registers in any order. 

Does not unlock or clear any register. 


2. Read the EI_STAT register. This operation unlocks the EI_ADDR, 
BC_TAG_ADDR, and FILL_SYN registers. It also unlocks the EL STAT 
register subject to conditions given in Table 6-2, which defines the loading and 
locking rules for external interface registers. 


NOTE: If the first error is correctable, the registers are loaded but not locked. On 
the second correctable error, the registers are neither loaded nor locked. 


Registers are locked on the first uncorrectable error except the second hard error 
bit. This bit is set only for an uncorrectable error that follows an uncorrectable 
error. A correctable error that follows an uncorrectable error is not logged as a 
second error. B-cache tag parity errors are uncorrectable in this context. 
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Table 6-1 Extemal Interface Status Register 


Name Bits Type Description 


COR_ECC_ERR <31> R Correctable ECC Error. Indicates that 
fill data received from outside the CPU 
contained a correctable ECC error. 


EI_ES <30> R External Interface Error Source. When 
set, indicates that the error source is fill 
data from main memory or a system 
address/command parity error. When 
clear, the error source is fill data from the 
B-cache. 


This bit is only meaningful when 
<COR_ECC_ERR>, <UNC_ECC_ERR>, 
or <EI_PAR_ERR> is set in this register. 


This bit is not defined for a B-cache tag 
error (BC_TPERR) or a B-cache tag 
control parity error (BC_TC_ERR). 


BC_TC_PERR = <29> R B-Cache Tag Control Parity Error. 
Indicates that a B-cache read transaction 
encountered bad parity in the tag control 
RAM. 


BC_TPERR <28> R B-Cache Tag Address Parity Error. 
Indicates that a B-cache read transaction 
encountered bad parity in the tag address 
RAM. 


CHIP_ID <27:24> R Chip Identification. Read as “4.” Future 
update revisions to the chip will return new 
unique values. 


<23:0> All ones. 
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Table 6-1 Extemal Interface Status Register (continued) 


Name 


Bits 


Type Description 


<63:36> 


SEO_HRD_ERR <35> 


FIL_IRD 


EI_PAR_ERR 


UNC_ECC_ER 


<34> 


<33> 


<32> 


R 


All ones. 


Second External Interface Hard Error. 
Indicates that a fill from B-cache or main 
memory, or a system address/command 
received by the CPU has a hard error while 
one of the hard error bits in the EL STST 
register is already set. 


Fill I-Ref D-Ref. When set, indicates that 
the error occurred during an I-ref fill. When 
clear, indicates that the error occurred during 
a D-ref fill. This bit has meaning only when 
one of the ECC or parity error bits is set. 


This bit is not defined for a B-cache tag 
parity error (BC_TPERR) or a B-cache tag 
control parity error (BC_TC_ERR). 


External Interface Command/Address 
Parity Error. Indicates that an address and 
command received by the CPU has a parity 
error. 


Uncorrectable ECC Error. Indicates that 
fill data received from outside the CPU 
contained an uncorrectable ECC error. In 
parity mode, this bit indicates a data parity 


error. 
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6.1.1 Extemal Interface Address Register- El ADDR 


The EI_ADDR register contains the physical address associated with errors 
reported by the EI_STAT register. It is unlocked by a read of the EI_STAT 
Register. This register is meaningful only when one of the error bits is set. 


Address FF FFFO 0148 
Access R 
31 413 0 
All 1s 
let | | | | | aolao 22 
El ADDR 
All As <39:32> 
PKW0454-96 
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Table 6-2 Loading and Locking Rules for Extemal 
Interface Registers 


Cormect Uncornect Second 


-able able Enor Hard Load Lock Action When 
Enor Enor Register Register El SIATIs Read 
0 0) Not No No Clears and unlocks 
possible all registers 
1 0 Not Yes No Clears and unlocks 
possible all registers 
0) 1 0) Yes Yes Clears and unlocks 
all registers 
1' 1 0 Yes Yes Clear bit (c) does 
not unlock. 


Transition to 
“0,1,0” state. 


0 1 1 No Already — Clears and unlocks 
locked all registers 

1' 1 1 No Already — Clear bit (c) does 
locked not unlock. 


Transition to 
“0,1,1” state. 


'These are special cases. It is possible that when EL ADDR is read, only the correctable error bit is set and 
the registers are not locked. By the time EI_STAT is read, an uncorrectable error is detected and the 
registers are loaded again and locked. The value of EI_ADDR read earlier is no longer valid. Therefore, for 
the “1,1,x” case, when EI_STAT is read correctable, the error bit is cleared and the registers are not 
unlocked or cleared. Software must reexecute the IPR read sequence. On the second read operation, error 
bits are in ‘“0,1,x” state, all the related IPRs are unlocked, and EI_STAT is cleared. 
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6.1.2 MC Enor Information Register 0 
(MC_ERRO - Offset = 800) 


The low-order MC bus (system bus) address bits are latched into this register 
when the system bus to PCI bus bridge detects an error event. If the event is a 
hard error, the register bits are locked. A write to clear symptom bits in the 
CAP Error’ Register unlocks this register. When the valid bit 
(MC_ERR_VALID) in the CAP Error Register is clear, the contents are 
undefined. 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 


0) 
\ Failing Address ADDR<31:04> 
Table 6-3 MC Enor Information Register 0 
Initial 
Name Bits Type State Description 
ADDR<31:4> <31:4> RO 0 Contains the address of the 
transaction on the system 
bus when an error is 
detected. 


Reserved <3:0> RO 0) 
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6.1.3 MC Enor Information Register 1 
(MC_ERRI1 - Offset = 840) 


The high-order MC bus (system bus) address bits and error symptoms are 
latched into this register when the system bus to PCI bus bridge detects an 
error. If the event is a hard error, the register bits are locked. A write to clear 
symptom bits in the CAP Error Register unlocks this register. When the valid 
bit (WC_ERR_VALID) in the CAP Error Register is clear, the contents are 
undefined. 


31 30 29 28:27 26 25 24.23 22 21 20:19 18 17 16:15 14 13 12:11 10 09 08:07 06 05 04:03 02 01 00 


reserved ee 
VALID “ ge 
Dirty 
DEVICE_ID 


MC Command<5:0> 
Failing Address ADDR<39:32> 
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Table 6-4 MC Enor Information Register 1 


Name 


Bits 


Initial 


State 


Description 


VALID 


Reserved 


Dirty 


Reserved 


DEVICE_ID 


MC_CMD<535:0> 


ADDR<39:32> 


<31> 


<30:21> 
<20> 


<19:17> 
<16:14> 


<13:8> 


<7:0> 


RO 


RO 
RO 


RO 


RO 


RO 


0 


Logical OR of bits 
<30:23> in the 
CAP_ERR Register. Set 
if MC_ERRO and 
MC_ERRI1 contain a 
valid address. 


Set if the system bus 
error was associated 
with a Read/Dirty 
transaction. When set, 
the device ID field 
<19:14> does not 
indicate the source of 
the data. 


All ones. 


Slot number of bus 
master at the time of the 
error. 


Active command at the 
time the error was 
detected. 


Address bits <39:32> of 
the transaction on the 
system bus when an 


error is detected. 
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6.1.4 CAP Enor Register 
(CAP _ERR - Offset = 880) 


CAP_ERR is used to log information pertaining to an error detected by the CAP 
or MDP ASIC. If the error is a hard error, the register is locked. All bits, except 
the LOST_MC_ERR bit, are locked on hard errors. CAP_ERR remains locked 
until the CAP error is written to clear each individual error bit. 


31 30 29 28 27 26 25 24 


23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 


—— PERR 
—— SERR 

——— MAB 
———— PTE_INV 
PCI_ERR_ VALID 
reserved 
PIO_OVFL 


LOST_MC_ERR 


MC_ADR_PERR 


NXM 


CRDA 


CRDB 


RDSA 


RDSB 


MC_ERR VALID 
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<0> 
<I> 
<2> 
<3> 
<4> 
<22:5> 
<23> 
<24> 
<25> 
<26> 
<27> 
<28> 
<29> 
<30> 


<31> 
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Table 6-5 CAP Enor Register 


Name 


Initial 


Description 


MC_ERR VALID 


RDSB 


RDSA 


CRDB 


CRDA 


NXM 


MC_ADR_PERR 


<30> 


<29> 


<28> 


<27> 


<26> 


<25> 


RO 


RWIC 


RWIC 


RWIC 


RWIC 


RWIC 


RWIC 


Logical OR of bits <30:23> 
in this register. When set 
MC_ERRO and MC_ERR1 
are latched. 


Uncorrectable ECC error 
detected by MDPB. Clear 
state in MDPB before 
clearing this bit. 


Uncorrectable ECC error 
detected by MDPA. Clear 
state in MDPA before 
clearing this bit. 


Correctable ECC error 
detected by MDPB. Clear 
state in MDPB_STAT before 
clearing this bit. 


Correctable ECC error 
detected by MDPA. Clear 
state in MDPA_STAT before 
clearing this bit. 


System bus master 
transaction status NXM 
(Read with Address bit <39> 
set but transaction not pended 
or transaction target above 
the top of memory register.) 
CPU will also get a fill error 
on reads. 


Set when a system bus 
command/address parity error 


is detected. 
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Table 6-5 CAP Enor Register (c ontinued) 


Initial 
Name Bits Type State Description 
LOST_MC_ERR <24> RWIC 0O Set when an error is detected 
but not logged because the 
associated symptom fields 


and registers are locked with 
the state of an earlier error. 


PIO_OVFL <23> RWIC 0O Set when a transaction that 
targets this system bus to PCI 
bus bridge is not serviced 
because the buffers are full. 
This is a symptom of setting 
the PEND_NUM field in 
CAP_CNTL to an incorrect 
value. 


Reserved <22:5> RO 0 


PCI_ERR_ VALID <4> RO 0 Logical OR of bits <3:0> of 
this register. When set, the 
PCI error address register is 
locked. 


PTE_INV <3> RWIC 0 Invalid page table entry on 
scatter/gather access. 


MAB <2> RWIC 0O PCI master state machine 
detected PCI Target Abort 
(likely cause: NXM) (except 
Special Cycle). On reads fill 
error is also returned. 


SERR <1> RWIC  0O PCI target state machine 
observed SERR#. CAP 
asserts SERR when it is 
master and detects target 
abort. 


PERR <0> RWIC 0 PCI master state machine 
observed PERR#. 
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6.1.5 PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040) 


PCI_ERR1 is used by the system bus to PCI bus bridge to log bus address 
<31:0> pertaining to an error condition logged in CAP_ERR. This register 
always captures PCI address <31:0>, even for a PCI DAC cycle. When the 
PCI_ERR_VALID bit in CAP_ERR is clear, the contents are undefined. 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 


Failing Address ADDR<31:0> 


Table 6-6 PCI Enor Status Register 1 


Initial 


Name Bits Type State Description 
ADDR<31:0> <31:0> RO 0 Contains address bits 


<31:0> of the transaction 
on the PCI bus when an 
error is detected. 
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Chapter 7 
Removal and Replacement 


This chapter describes removal and replacement procedures for field-replaceable 
units (FRUs). 


7.1 System Safety 


Observe the safety guidelines in this section to prevent personal injury. 


CAUTION: Wear an antistatic wrist strap whenever you work on a system. 

The AlphaServer cabinet system has a wrist strap connected to the frame at the front 
and rear. The pedestal system does not have an attached strap, so you will have to 
take one to the site. 


WARNING: When the system interlocks are disabled and the system is still powered 
on, voltages are low in the system drawer, but current is high. Observe the following 
guidelines to prevent personal injury. 


I. Remove any jewelry that may conduct electricity before working on the system. 

2. Do not insert your hands between the fan and the power supply. 

3. Ifyou need to access the system card cage, power down the system and wait 2 
minutes to allow components in that area to cool. 
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7.2 FRU List 


Figure 7-1 shows the locations of FRUs in the system drawer, and Table 7-1 lists the 
part numbers of all field-replaceable units. 


Figure 7-1 System Drawer FRU Locations 
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Table 7-1 Held-Replaceable Unit Part Numbers 


CPU Modules 

B3001-CA | 300 MHz CPU, uncached 
B3002-AB 300 MHz CPU, 2 Mbyte cache 
B3004-BA 300 MHz CPU, 2 Mbyte cache 
B3004-AA 400 MHz, 4 Mbyte cache 
B3004-DA 466 MHz, 4 Mbyte cache 


Memory Modules 


B3020-CA 64 Mbyte synch 


B3030-EA 256 Mbyte asynch (EDO) 

B3030-FA 512 Mbyte asynch (EDO) 

B3030-GA 2 Gbyte asynch (EDO) 

Required System Drawer Modules and Display 

54-23803-01 | System motherboard (4100) 

54-23803-02 System motherboard (early 4000) 

54-23805-01 System motherborard (4000) 

B3040-AA System bus to PCI bus bridge module (both systems) 
B3040-AB System bus to PCI bus bridge module (later 4000 only) 
54-24117-01 Power control module 

B3050-AA PCI motherboard (both systems) 

B3051-AA PCI motherboard (later 4000 only) 

54-24364-01 OCP logic module 

54-24366-01 OCP switch module 

54-24674-01 Server control module 

54-24691-01 Fan fail detect module (cabinet only) 

30-43049-01 OCP display 
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Table 7-1 Field-Replaceable Unit Part Numbers (continued) 


Fans 

12-23609-21 | 4.5-inch fan 

12-24701-34 CPU fan 

Power System Components 

30-44712-01 | Power supply (H7291-AA) 

30-45353-01 Techniq AC Box (NA/Japan, H9A10-EB cabinet) 

30-45353-02 Techniq AC Box (Europe/AP, H9A10-EC cabinet) 

30-46788-01 Internal power source 40W/12V fan tray power 
(cabinet) 

H7600-AA Power controller (NA/Japan, H9A10-EL cabinet) 

H7600-DB Power controller (Europe/AP, H9A10-EM cabinet) 

12-23501-01 NEMA power strip (N.A./Japan, pedestal) 

12-45334-02 IEC power strip (Europe/AP, pedestal, and all 
cabinet systems) 

Intemal Power Cords 

17-04285-01 | .5 meter IEC to IEC 

17-00606-02 6 foot NEMA to IEC (N.A./Japan, pedestal) 

17-04285-02 2 meter IEC to IEC (Europe/AP, pedestal, and all 
cabinet systems.) 

17-04285-03 IEC to IEC StorageWorks shelf 

Fan Tray Cables (Cabinet Only) 

17-04324-01 | Elec fan power harness 

17-04325-01 12V power for SCM 

17-04338-01 Power ground cable 

17-04339-01 AC cable power 
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Table 7-1 Field-Replaceable Unit Part Numbers (continued) 


Server Control Module Power (Pedestal Only) 


30-46485-01 110V North America 
30-46485-02 220V Europe 
30-46485-03 Australia/N.Z. 
30-46485-04 220V U.K. 
System Drawer Cables and 
Jumpers From To 
17-04196-01 Server control Remote I/O SCM signal conn 
module signal signal conn on 
cable (60 pin) PCI mbrd 
17-04199-01 Current share Current share Current share conn on 
cable conn on PSO PS1 and PS2 
17-04200-01 Floppy signal Floppy conn Floppy 
cable (36 pin) on PCI mbrd 
17-04201-01 OCP signal OCP conn on OCP signal (system 
PCI mbrd drawer only) 
17-04201-02 OCP signal OCP OCP 
jumper 
17-04217-01 Power harness Power 7 conns. sys mbrd 
(4100 & early supply(s) sys fans 0, | 
4000) 5V conn on PCI mbrd 
CD-ROM drv pwr 
Floppy pwr 


1 OCP DC enable pwr 
conn or pwr conn on ped 
tray pwr drive cable (17- 


04293-01) 
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Table 7-1 Field-Replaceable Unit Part Numbers (continued) 


System Drawer Cables and 


Jumpers From To 
17-04358-01 Power harness Power 3 conns. sys mbrd 
(later 4000 only) supply(s) sys fans 0, | 
SV conn on PCI/EISA 
mbrd 
SV & 3V conn on PCI 
(right) mbrd 
CD-ROM drv pwr 
Floppy pwr 
1 OCP DC enable pwr 
conn or pwr conn on ped 
tray pwr drive cable (17- 
04293-01) 
17-04292-01 SCSI CD-ROM CD-ROM CD-ROM sig conn 
sig cable conn on PCI 
mbrd 
70-32016-01 Interlock switches Interlock Other OCP DC enable pwr 
and cable (4100 & switch assy conn or pwr conn on ped 
early 4000) tray pwr drive cable (17- 
04293-01) 
70-33002-01 Interlock switches _ Interlock Other OCP DC enable pwr 
and cable (later switch assy conn or pwr conn on ped 
4000 only) tray pwr drive cable (17- 
04293-01) 
17-04349-01 SCM 12V Interlock conn 12 V DC enable conn on 
interlock jumper on PCI mbrd SCM 
17-04350-01 SCM 34-position SCM SCM 
jumper 
17-04351-01 SCM 12V power Powerharness Sys fan 2 and SCM 
jumper (4100 & (17-04217-01) internal 12V conn 
early 4000 only) 
17-04363-01 SCM 16-position SCM sig conn 16 pos conn on SCM 


jumper 


on PCI mbrd 
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Table 7-1 Field-Replaceable Unit Part Numbers (continued) 


Pedestal Cables Froom To 

17-04293-01 Elec harness Power harness Ped tray bulkhead 
power (17-04217-01) (system side) 
cable+5/+12 


17-04302-01 OCP signal cable OCP sig conn OCP sig conn on ped tray 
on PCI mbrd bulkhead (system side) 


17-04305-01 Harness power Power conn Both OCP DC enable pwr 
cable +5/+12 on ped tray conn and pwr conn on 
bulkhd (tray optional SCSI drive 
side) 
17-04306-01 SCSI signal cable | SCSI sig conn Optional SCSI drive 
(narrow) on ped tray 
bulkhd (tray 
side) 
17-04380-01 OCP signal cable OCP sig conn OCP sig conn 
on ped tray 


bulkhd 
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7.3 4100 Power System FRUs 


Figure 7-2 Location of 4100 Power System FRUs 
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PartNumber Description 


1 30-45353-01 AC input box; only in cabinet systems: The 01 variant is for N. 
Amer./Japan and has a NEMA L6-30P power cord; the 02 variant is 
for Europe and AP and has an IEC 309 power cord. 


30-45353-02 


2 12-23501-01 AC power strip: The 12-23501-01 is used on pedestals in N. 

Amer./Japan only and has six NEMA outlets and a 15 ft. cord to the 
wall outlet; the 12-45334-02 is used on pedestals in Eur./AP and on 
cabinet systems worldwide and has six IEC320 outlets. In pedestal 


12-45334-02 


systems, cords match country-specific wall outlets. 
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PartNumber Description 


2a 


CS PN HD nm 


11 


12 
13 


14 


~ 17-04285-01 


H7600-AA 


H7600-DB 


17-00606-02 
17-04285-02 


30-44712-01 
17-04199-01 
17-04217-01 
17-04201-01 
70-32016-01 
17-04351-01 
17-04293-01 


17-04302-01 


17-04201-01 
17-04305-01 


17-04339-01 


Power cord from AC input box to power strip. .5 meter, IEC320 to 
IEC320 connector used in cabinet systems only. In pedestal 
systems, cords match country-specific wall outlets. 


Power controller used in place of 30-45353-01, 12-45334-02, and 
17-04285-02 in the H9A10-EL cabinet in N. America/Japan 


Power controller used in place of 30-45353-02, 12-45334-02, and 
17-04285-02 in the H9A10-EM cabinet in Europe/AP 


Power cord from power strip to power supply: The 17-00606-02 is a 
2 m NEMA to IEC320 AC jumper used with the 12-23501-01 power 
strip in N. Amer./Japan pedestals. The 17-04285-02 is a 2 m IEC320 
to IEC320 AC jumper used with the 12-45334-02 power strip used 
on pedestals in Eur./APA and on cabinet systems worldwide and has 
six IEC320 outlets. In pedestal systems, cords match country- 
specific wall outlets. 


Power supply; 92 to 264 VAC input; one to three in a system drawer. 
Cable connecting power supplies 

Power distribution harness (4100 and early 4000) 

Cable from OCP to PCI motherboard (cabinet system) 

Interlock switches and cable to OCP (4100 and early 4000) 

Power from power harness between harness and Fan 2 to SCM 


Cable from power harness to interconnect cable and pedestal tray 
connector (pedestal system) 


Cable from pedestal tray connector to PCI motherboard (pedestal 
system) 


Cable from pedestal tray connector to OCP (pedestal system) 


Cable from pedestal tray connector to OCP and SCSI devices 
(pedestal system) 


Power cord from power strip to cabinet fan tray (cabinet only) 
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7.4 4000 Power System FRUs 


Figure 7-3 Location of 4000 Power System FRUs 
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PartNumber Description 


1: 30-45353-01 
30-45353-02 


2 12-23501-01 
12-45334-02 


AC input box; only in cabinet systems: The 01 variant is for N. 
Amer./Japan and has a NEMA L6-30P power cord; the 02 variant is 
for Europe and AP and has an IEC 309 power cord. 


AC power strip: The 12-23501-01 is used on pedestals in N. 
Amer./Japan only and has six NEMA outlets and a 15 ft. cord to the 
wall outlet; the 12-45334-02 is used on pedestals in Eur./AP and on 
cabinet systems worldwide and has six IEC320 outlets. In pedestal 
systems, cords match country-specific wall outlets. 
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PartNumber Description 


2a 


rPmrnNn AD mn 


11 


12 
13 


14 


~ 17-04285-01 


H7600-AA 


H7600-DB 


17-00606-02 
17-04285-02 


30-44712-01 
17-04199-01 
17-04385-01 
17-04201-01 
70-33002-01 
17-04293-01 


17-04302-01 


17-04201-01 
17-04305-01 


17-04339-01 


Power cord from AC input box to power strip. .5 meter, IEC320 to 
IEC320 connector used in cabinet systems only. In pedestal 
systems, cords match country-specific wall outlets. 


Power controller used in place of 30-45353-01, 12-45334-02, and 
17-04285-02 in the H9A10-EL cabinet in N. America/Japan 


Power controller used in place of 30-45353-02, 12-45334-02, and 
17-04285-02 in the H9A10-EM cabinet in Europe/AP 


Power cord from power strip to power supply: The 17-00606-02 is a 
2 m NEMA to IEC320 AC jumper used with the 12-23501-01 power 
strip in N. Amer./Japan pedestals. The 17-04285-02 is a 2 m IEC320 
to IEC320 AC jumper used with the 12-45334-02 power strip used 
on pedestals in Eur./APA and on cabinet systems worldwide and has 
six IEC320 outlets. In pedestal systems, cords match country- 
specific wall outlets. 


Power supply; 92 to 264 VAC input; one to three in a system drawer. 
Cable connecting power supplies 

Power distribution harness (later 4000 only) 

Cable from OCP to PCI motherboard (cabinet system) 

Interlock switches and cable to OCP (later 4000 only) 


Cable from power harness to interconnect cable and pedestal tray 
connector (pedestal system) 


Cable from pedestal tray connector to PCI motherboard (pedestal 
system) 


Cable from pedestal tray connector to OCP (pedestal system) 


Cable from pedestal tray connector to OCP and SCSI devices 
(pedestal system) 


Power cord from power strip to cabinet fan tray (cabinet only) 
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7.5 System Drawer Exposure (Cabinet) 


There are two cabinet types for these systems: the H9A10-EB -EC_ cabinet and 
the H9A10-EL -EM cabinet. System drawer exposure differs depending upon 
the cabinet. 


7.5.1 Cabinet Drawer Exposure (H9A10-EB & -EC) 


Open both doors, disconnect cables that obstruct movement of the drawer, 
remove the shipping brackets, and slide the drawer out from the cabinet. 


Figure 7-4 Exposing System Drawer (H9A10-EB & -EC Cabinet) 
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Exposing the System Bus or PCI Bus Card Cages 


1. 
2. 
3: 


Open the front and rear doors of the cabinet. 
At the front of the cabinet, unplug the drawer’s power supplies. 


At the rear, remove the two Phillips screws holding the shipping bracket on the 
right rail so that the drawer can be pulled out. 


Using a flathead screwdriver, disengage the lock mechanism at the lower left 
hand corner of the drawer. 


Pull the drawer out part way and release the lock mechanism by removing the 
screwdriver. If you wish to remove the whole drawer for some reason, leave the 
screwdriver in place. 


Once the lock mechanism has been released, slide the drawer out until it locks. 


Remove the system bus card cage cover. Unscrew the two Phillips head screws 
holding the cover in place and slide it off the drawer. 


Remove the PCI bus card cage cover. Unscrew the three Phillips head screws 
holding the cover to the side of the drawer and slide it off the drawer. 


Exposing the Power System or System Fans 


1. 
2. 


Open the front and rear doors of the cabinet. 


At the rear of the cabinet, remove any cables from PCI options that may 
interfere with pulling the drawer forward. 


At the front, remove the shipping brackets on the right and left rails that hold the 
drawer. 


Pull out the drawer until it locks. 


Remove the power section cover. Unscrew the two Phillips head screws and 
slide the cover off the drawer. 
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7.5.2 Cabinet Drawer Exposure (H9A10-EL & EM) 


In the H9A10-EL and -EM Cabinet, the system drawer sits on a tray that slides 
out of the front of the cabinet. A stabilizer bar must be pulled out from the 
bottom to pevent the cabinet from tipping over. 


Figure 7-5 Exposing System Drawer (H9A10-EL &-EM Cabinet) 
PCI (4000) 


a PKW0457-97 
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CAUTION: The cabinet could tip over if a system drawer is pulled out and the 
stablizing bar is not fully extended and its leveler foot on the floor. 


Exposing any section of the system drawer in an H9A10-EL or -EM Cabinet. 


1. 


i ee ON 


10. 


Open the front door of the cabinet. 

Pull the stabilizer bar at the bottom of the cabinet out until it stops. 
Extend the leveler foot at the enc of the stabilizer bar to the floor. 
Unplug the drawer’s power supplies. 


Remove the Phillips screws holding the shipping bracket to the rails so that the 
drawer can be pulled out. 


Pull the drawer all the way out until it locks 


To access the system bus card cage cover, unscrew the two Phillips head screws 
holding the cover in place and slide it off. 


To access the PCI/EESA bus card cage, unscrew the three Phillips head screws 
holding the cover to the right side of the drawer and slide it off. 


To access the PCI bus card cage, unscrew the three Phillips head screws holding 
the cover to the left side of the drawer and slide it off. 


To access the power or fan section, unscrew the two Phillips head screws 
holding the cover in place and slide it off. 
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7.6 System Drawer Exposure (Pedestal) 


Figure 7-5 Exposing System Drawer (Pedestal) 
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Exposing the System Drawer 


i 


Open the front door and remove it by lifting and pulling it away from the 
system. 


Remove the top cover. Unscrew the two Phillips head screws midway up on 
each side of the pedestal, tilt the cover up, and lift it away from the frame. 


Remove the system bus card cage cover at the back of the pedestal if you are 
replacing any of the following: CPU, memory, power control module, system 
bus to PCI bus module, system motherboard, cables that attach to the system 
motherboard, or a system fan. To remove the cover, unscrew the two Phillips 
head screws and slide the cover off the drawer. 


Remove the PCI bus card cage cover at the back of the pedestal if you are 
replacing any of the following: PCI or EISA option, server control module, PCI 
motherboard, cables attached to the PCI motherboard. To remove the cover, 
unscrew the three Phillips head screws holding the cover to the side of the 
drawer and slide the cover off the drawer. 


Remove the pedestal tray as described below if you are replacing any of the 
following: system fan, power supply, power cables. 


Removing the Pedestal Tray 


1. 
2, 


Remove the tray cover by loosening the screws at the back of the tray. 


Disconnect the cables from the OCP and any optional SCSI device from the 
bulkhead connector in the rear right corner of the tray. 


Unscrew the Phillips head screw holding the bulkhead to the tray. 


Unscrew the two Phillips head retaining screws and slide the tray off the drawer. 


Removaland Replacement 7-17 


7.7 CPU Removal and Replacement 


CAUTION: Several different CPU modules work in these systems. Unless you are 
upgrading, be sure you are replacing the broken module with the same variant. 
B3001 and B3002 can only be used in AlphaServer 4100 systems. 


Figure 7-6 Removing CPU Module 
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WARNING: CPU modules and memory modules have parts that operate at high 
temperatures. Wait 2 minutes after power is removed before touching any module. 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the system bus card cage. Remove the two Phillips head screws holding 
the cover in place and slide it off the drawer. 


4. Identify and remove faulty CPU. A label to the left of the system bus card cage 
identifies which slot contains CPUO, CPU1, CPU2, or CPU3. The CPU is held 
in place with levers at both ends; simultaneously raise the levers and lift the 
CPU from the cage. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification — DIGITAL UNIX and OpenVMS Systems 


1. Bring the system up to the SRM console by pressing the Halt button, if 
necessary. 


2. Issue the show cpu command to display the status of the new module. 


Verification — Windows NT Systems 


1. Start AlphaBIOS Setup, select Display System Configuration, and press 
Enter. 


2. Using the arrow keys, select MC Bus Configuration to display the status of the 
new module. 
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7.8 CPU Fan Removal and Replacement 


Figure 7-7 Removing CPU Fan 
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Removal 

1. Follow the CPU Removal and Replacement procedure. 

2. Unplug the fan from the module. 

3. Remove the four Phillips head screws holding the fan to the Alpha chip’s 
heatsink. 

Replacement 


Reverse the above procedure. 


Verification 


If the system powers up, the CPU fan is working. 
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7.9 Memory Removal and Replacement 


CAUTION: Several different memory modules work in these systems. Be sure you 
are replacing the broken module with the same variant. 


Figure 7-8 Removing Memory Module 
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WARNING: CPU modules and memory modules have parts that operate at high 
temperatures. Wait 2 minutes after power is removed before touching any module. 
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Removal 


1. Shut down the operating system and power down the system. 

2. Expose the system drawer. 

3. Expose the system bus card cage. Remove the two Phillips head screws holding 
the cover in place and slide it off the drawer. 

4. Identify and remove the faulty module. A label to the left of the system card 
cage identifies which slot contains the high or low halves of memory banks. 
The memory module is held in place by a flathead captive screw attached to the 
top brace of the module. Loosen the screw and lift the module from the cage. 

Replacement 


Reverse the steps in the Removal procedure. 


NOTE: Memory modules must be installed in pairs. When you replace a bad 
module, be sure the second module in the pair is in place. 


Verification — DIGITAL UNIX and OpenVMS Systems 


1. 


Bring the system up to the SRM console by pressing the Halt button, if 
necessary. 


Issue the show memory command to display the status of the new memory. 


Verify the functioning of the new memory by issuing the command test memn, 
where n is 0, 1, 2, 3, or *. 


Verification — Windows NT Systems 


1. 


Start AlphaBIOS Setup, select Display System Configuration, and press 
Enter. 


Using the arrow keys, select Memory Configuration to display the status of the 
new memory. 


Switch to the SRM console (press the Halt button in so that the LED on the 
button lights and reset the system). Verify the functioning of the new memory 
by issuing the command test memn, where n is 0, 1, 2, 3, or *. 
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7.10 Power Control Module Removal and 
Replacement 


Figure 7-9 Removing Power Control Module 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the system bus card cage. Remove the two Phillips head screws holding 
the cover in place and slide it off the drawer. 


4. Remove the faulty PCM. The PCM is located in the back left corner of the 
system bus card cage. A captive flathead screw and the rear card guide hold the 
PCM in place. Unscrew the screw and lift the module from the cage. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the PCM is faulty or not seated properly, the system will not 
come up. 
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7.11 System Bus to PC! Bus Bridge (B3040-AA) 
Module Removal and Replacement 


Figure 7-10 Removing System Bus to PCI/ EISA Bus Bridge Module 
(B3040-AA) 


PKW0413-96 
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Removal 


1. 
2. 
3: 


So Tate 


10. 


11. 


Shut down the operating system and power down the system. 
Expose the system drawer. 


Expose the system bus card cage. Remove the two Phillips head screws holding 
the cover in place and slide it off the drawer. 


Expose the PCI bus card cage. Remove three Phillips head screws holding the 
cover in place and slide it off the drawer. 


Remove all the PCI/EISA options. 
Remove the server control module. 
Remove the PCI motherboard. 


Remove the two Phillips head screws holding the system bus to PCI bus bridge 
module to the sheet metal between the system bus card cage and the PCI bus 
card cage. 


Remove enough CPU and memory modules to the right of the bridge module to 
allow a flathead screwdriver to be inserted in the slot in the middle of the 
module’s top bracket. 


Place a flathead screwdriver into the slot in the middle of the module’s top 
bracket and into the corresponding slot in the sheet metal between the two card 
cages. Use the screwdriver as a lever to disconnect the bridge module from the 
connector on the system motherboard. 


Remove the bridge module from the system bus card cage. 


Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system (press the Halt button if necessary to bring up the SRM 
console) and issue the show device command at the console prompt to verify that the 
system sees all system options and peripherals. 
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7.12 System Bus to PCI Bus Bridge (B3040-AB) 
Module Removal and Replacement 


Figure 7-11 Removing System Bus to PCI Bus Bridge Module 
(B3040-AB) 
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Removal 


1. 
2. 
3: 


Shut down the operating system and power down the system. 
Expose the system drawer. 


Expose the system bus card cage. Remove the two Phillips head screws holding 
the cover in place and slide it off the drawer. 


Expose the PCI bus card cage on the right side of the drawer. Remove three 
Phillips head screws holding the cover in place and slide it off the drawer. 


Remove all the PCI options. 
Remove the PCI motherboard. 


Remove the two Phillips head screws holding the system bus to PCI bus bridge 
module to the sheet metal between the system bus card cage and the PCI bus 
card cage. 


Remove enough CPU and memory modules to the left of the bridge module to 
allow a flathead screwdriver to be inserted in the slot in the middle of the 
module’s top bracket. 


Place a flathead screwdriver into the slot in the middle of the module’s top 
bracket and into the corresponding slot in the sheet metal between the two card 
cages. Use the screwdriver as a lever to disconnect the bridge module from the 
connector on the system motherboard. 


10. Remove the bridge module from the system bus card cage. 


Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system (press the Halt button if necessary to bring up the SRM 
console) and issue the show device command at the console prompt to verify that the 
system sees all system options and peripherals. 
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7.13 System Motherboard (4100 Searly 4000) 
Removal and Replacement 


The system motherboard contains an NVRAM that holds the system serial 
number. Be sure to record this number before replacing the module. The serial 
number is on a barcode on the side of the system drawer or on the system bus 
card cage. The part number for the 4100 is 54-23803-01 and for the early 4000 
is 54-23803-02. 


Figure 7-12 Removing System Motherboard 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the system bus card cage by removing the two Phillips head screws 
holding it in place and sliding the cover off the drawer. 


4. Remove all CPUs, memory modules, and the PCM from the system 
motherboard. 
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5. Expose the PCI bus card cage. Remove three Phillips head screws holding the 
cover in place and slide it off the drawer. 


6. Remove all the PCI/EISA options. 

7. Remove the server control module. 

8. Remove the PCI motherboard. 

9. Remove system bus to PCI bus module from the system motherboard. 


10. Remove the bracket holding the power cables in place as they pass from the 
system bus section to the power section of the drawer. 


11. Disconnect all cables to the system motherboard and lay them back over the 
power supply section of the system drawer. 


CAUTION: Secure the power harness connectors in the system card cage to 
ensure that they cannot damage the pins in the CPU connectors. 


12. Remove both the front and back module card guides. Unscrew the two screws 
that hold the guides in place. 


13. Remove the system motherboard from the card cage by removing the 15 Phillips 
head screws holding it in place. Record the system serial number. (The serial 
number is on a barcode on the side of the system drawer or on the system bus 
card cage.) 


Replacement 


Reverse the above procedure. To align the motherboard in the cage, start replacing 
the screws in the corners next to the system bus to PCI bus bridge module and then 
the PCM module. Subsequent screws should align properly. 


Verification 


1. Power up the system (press the Halt button if necessary to bring up the SRM 
console) and issue the show device command at the console prompt to verify 
that all system options are seen. 


2. Restore the system serial number by issuing the set sys_serial_num command at 
the SRM console prompt. 
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7.14 System Motherboard (4000) Removal and 
Replacement 


The system motherboard contains an NVRAM that holds the system serial 
number. Be sure to record this number before replacing the module. The serial 
number is on a barcode on the side of the system drawer or on the system bus 
card cage. The part number for the later 4000 is 54-23805-01. 


Figure 7-13 Removing System Motherboard 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the system bus card cage by removing the two Phillips head screws 
holding it in place and sliding the cover off the drawer. 


4. Remove all CPUs, memory modules, and the PCM from the system 
motherboard. 


5. Expose both PCI bus card cages. Remove three Phillips head screws holding 
each cover in place and slide them off the drawer. 
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12. 


13. 


Remove all the PCI/EISA options. 
Remove the server control module. 
Remove the PCI motherboards. 


Remove both bridge modules from the system motherboard. 


. Remove the bracket holding the power cables in place as they pass from the 


system bus section to the power section of the drawer. 


. Disconnect all cables to the system motherboard and lay them back over the 


power supply section of the system drawer. 


CAUTION: Secure the power harness connectors in the system card cage to 
ensure that they cannot damage the pins in the CPU connectors. 


Remove both the front and back module card guides. Unscrew the two screws 
that hold the guides in place. 


Remove the system motherboard from the card cage by removing the 15 Phillips 
head screws holding it in place. Record the system serial number. (The serial 
number is on a barcode on the side of the system drawer or on the system bus 
card cage.) 


Replacement 


Reverse the above procedure. To align the motherboard in the cage, replace screws 
in adjacent corners of the module. Subsequent screws should align properly. 


Verification 


1. 


Power up the system (press the Halt button if necessary to bring up the SRM 
console) and issue the show device command at the console prompt to verify 
that all system options are seen. 


Restore the system serial number by issuing the set sys_serial_num command at 
the SRM console prompt. 
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7.15 PCI/EISA Motherboard (B3050) Removal and 
Replacement 


Figure 7-14 Replacing PCI/HSA Motherboard 
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Removal 


The PCI motherboard contains an NVRAM with ECU data and customized console 
environment variables. Therefore, if the console runs, execute a show * command at 
the console prompt and, if you have not done so earlier, record the settings for the 
sys_model_number and sys_type environment variables. These environment 
variables are used to display the system model number and type, and they compute 
certain information passed to the operating system. When you replace the PCI 
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motherboard, these environment variables are lost and must be restored after the 
module swap. 


1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the PCI bus card cage. Remove three Phillips head screws holding the 
cover in place and slide it off the drawer. 


Remove all PCI and EISA options. 
Disconnect all cables connected to the PCI motherboard. 


Remove the server control module. 


al Ov Oa 


Unscrew the two screws holding the system bus to PCI bus bridge module in the 
system bus card cage to the PCI motherboard. 


8. Remove the nine Phillips head screws that hold the motherboard in place. To 
reach the screws on the bottom of the board, thread your screwdriver through the 
three holes in the sheet metal. 


9. Carefully pry the motherboard loose from the system bus to PCI bus bridge 
module on the other side of the sheet metal separating the system bus card cage 
from the PCI card cage. 


10. Remove the motherboard from the card cage. 


Replacement 


Reverse the steps in the Removal procedure. 


Verification 


1. Power up the system (press the Halt button if necessary to bring up the SRM 
console) and issue the show device command at the console prompt to verify 
that the system sees all options. 


2. Restore the sys_model_num, sys_type, and other customized environment 
variables to their previous settings. Run the ECU to restore EISA configuration 
data. This must be done regardless of whether there is an EISA option in the 
EISA slot on PCI 0. 
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7.16 PCI Motherboard (B3051) Removal and 
Replacement 


Figure 7-15 Replacing PCI Motherboard 


PKW0409A-96 


7-36 AlphaServer 4000/4100 Service Manual 


Removal 


1. Shut down the operating system and power down the system. 

2. Expose the system drawer. 

3. Expose the PCI bus card cage on the right when viewing the drawer from the 
rear. Remove three Phillips head screws holding the cover in place, and slide it 
off the drawer. 

4. Remove all PCI options. 

5. Disconnect all cables connected to the PCI motherboard. 

6. Unscrew the two screws holding the system bus to PCI bus bridge module in the 
system bus card cage to the PCI motherboard. 

7. Remove the nine Phillips head screws that hold the motherboard in place. To 
reach the screws on the bottom of the board, thread your screwdriver through the 
three holes in the sheet metal. 

8. Carefully pry the motherboard loose from the system bus to PCI bus bridge 
module on the other side of the sheet metal separating the system bus card cage 
from the PCI card cage. 

9. Remove the motherboard from the card cage. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification 


1. 


Power up the system (press the Halt button if necessary to bring up the SRM 
console) and issue the show device command at the console prompt to verify 
that the system sees all options. 
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7.17 Server Control Module Removal and 
Replacement 


Figure 7-16 Removing Server Contol Module 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the PCI bus card cage. Remove three Phillips head screws holding the 
cover in place and slide it off the drawer. 


4. Disconnect the cables connected at the bulkhead to the server control module. 


5. If necessary, remove several PCI and EISA options from the bottom of the PCI 
card cage up until you can access the server control module. 


6. Disconnect the two cables connected to the PCI motherboard at the server 
control module end. 


7. Disconnect the twisted pair power cable from the module. 


8. Place a credit car or a piece of cardboard between the edge of the SCM module 
and the B3050 module to protect the delicate pins of the ASICs located in the 
lower right corner of the B3050 module. 


9. The server control module is held in place by four stud snaps. Using a flathead 
screwdriver gently pry the SCM module off the snaps and remove it. Make sure 
you do not hit the pins of the ASICs on the B3050. 


Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Verify console output on COM1. 


Removaland Replacement 7-39 


7.18 PCI/EISA Option Removal and Replacement 


Figure 7-17 Removing PCI/EISA Option 
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WARNING: To prevent fire, use only modules with current limited outputs. See 
National Electrical Code NFPA 70 or Safety of Information Technology Equipment, 
Including Electrical Business Equipment EN 60 950. 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the PCI bus card cage. Remove three Phillips head screws holding the 
cover in place and slide it off the drawer. 


4. Remove the faulty option. Disconnect cables connected to the option. Unscrew 
the small Phillips head screw securing the option to the card cage. Slide the 
option from the card cage. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification — DIGITAL UNIX and OpenVMS Systems 


1. Power up the system (press the Halt button if necessary to bring up the SRM 
console) and run the ECU to restore EISA configuration data. 


2. Issue the show config command or show device command at the console 
prompt to verify that the system sees the option you replaced. 


3. Run any diagnostic appropriate for the option you replaced. 


Verification — Windows NT Systems 


1. Start AlphaBIOS Setup, select Display System Configuration, and press 
Enter. 


2. Using the arrow keys, select PCI Configuration or EISA Configuration to 
determine that the new option is listed. 
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7.19 Power Supply Removal and Replacement 


Figure 7-18 Removing Power Supply 
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Removal 


1. Shut down the operating system and power down the system. 

2. Expose the system drawer. 

3. Remove the cover to the power section of the drawer. Remove the two Phillips 
head screws holding the cover in place and slide it off the drawer. 

4. Release the power supply tray by removing the two Phillips head screws on the 
side of the drawer. See Q. 

5. Lift the power supply tray to release it from the sheet metal and slide it out from 
the drawer until it locks (about 4 inches). 

6. Tilt the tray to allow easier access to the back of the power supplies. 

7. Unplug the connectors at the rear of the supply that is being replaced. 

8. Unscrew the four Phillips head screws at the front of the tray that hold the power 
supply in place. Also unscrew the two screws at the back of the power supply. 
See @ 

9. Remove the power supply. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the system has redundant power, the system will power up 
regardless of whether the replaced power supply is faulty. In this case look at the 
PCM LEDs to determine that the power supply is functioning properly. If the system 
does not have redundant power, it will not power up. 
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7.20 Power Hamess (4100 and early 4000) Removal 


and Replacement 


Figure 7-19 Removing Power Hamess 
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Removal 


1. Shut down the operating system and power down the system. 

2. Expose the system drawer. 

3. Expose the power, system card cage, and PCI/EISA sections of the drawer by 
removing all covers. Unscrew the Phillips head screws holding each cover in 
place and slide the covers off the drawer. 

4. Release the power supply tray by removing the two Phillips head screws on the 
side of the drawer. 

5. Lift the power supply tray to release it from the sheet metal and slide it out from 
the drawer until it locks. 

6. Tilt the tray to allow easier access to the fans. 

7. Remove the bracket holding the power harness as it passes from the power 
section to the system card cage section of the drawer. Remove the three Phillips 
head screws holding the bracket in place. 

8. Disconnect the power harness from the system motherboard and fold the harness 
back over the power supplies. 

CAUTION: Secure the power harness connectors in the system card cage to 
ensure that they cannot damage the pins in the CPU connectors. 

9. Disconnect the two power connectors from the PCI/ESIA motherboard. Push the 
power cable through the hole from the PCI/EISA section into the power section. 

10. Disconnect the fan power cables from the power harness. 

11. Remove the four Phillips head screws holding the OCP tray to the drawer. 

12. Slide the tray from the drawer far enough to disconnect the power cables 
attached to the OCP (cabinet only), the floppy, and the CD-ROM. 

13. As you remove the tray from the system, push the power cables through the hole 
at the back of the tray into the power section of the drawer. 

14. Disconnect the power harness from the power supplies. Remove the harness 
from the system. 

Replacement 


Reverse the steps in the Removal procedure. 
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7.21 Power Hamess (4000) Removal and 
Replacement 


Figure 7-20 Removing Power Hamess 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the power and system card cage sections of the drawer by removing the 
two covers. Unscrew the two Phillips head screws holding each cover in place 
and slide the covers off the drawer. 


4. If you want more space to work on the fans, do this step and the next; otherwise 
skip to step 7. Release the power supply tray by removing the two Phillips head 
screws on the side of the drawer. 


5. Lift the power supply tray to release it from the sheet metal and slide it out from 
the drawer until it locks. 


6. Tilt the tray to allow easier access to the fans. 


7. Remove the bracket holding the power harness as it passes from the power 
section to the system card cage section of the drawer. Remove the three Phillips 
head screws holding the bracket in place. 


8. Disconnect the power harness from the system motherboard and fold the harness 
back over the power supplies. 


CAUTION: Secure the power harness connectors in the system card cage to 
ensure that they cannot damage the pins in the CPU connectors. 


9. Disconnect the three power connectors from the PCI/ESIA motherboard. Push 
the cable through the hole from the PCI/EISA section into the power section. 


10. Disconnect the three power connectors from the other PCI motherboard. Push 
the power cable through the hole from the PCI section into the power section. 


11. Disconnect the fan power cables from the power harness. 
12. Remove the four Phillips head screws holding the OCP tray to the drawer. 


13. Slide the tray from the drawer far enough to disconnect the power cables 
attached to the OCP (cabinet only), the floppy, and the CD-ROM. 


14. As you remove the tray from the system, push the power cables through the hole 
at the back of the tray into the power section of the drawer. 


15. Disconnect the power harness from the power supplies. Remove the harness 
from the system. 


Replacement 


Reverse the steps in the Removal procedure. 


Removaland Replacement 7-47 


7.22 System Drawer Fan Removal and 
Replacement 


Figure 7-21 Removing System Drawer Fan 
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Removal 


1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Expose the power system, the system card cage, and the PCI card cage sections 
of the drawer by removing all three covers. Unscrew the two Phillips head 
screws holding each cover on top of the drawer in place and slide them off the 
drawer. Release the two lever latches holding the PCI card cage cover in place 
and slide it off. 
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4. Release the power supply tray by removing the two Phillips head screws on the 
side of the drawer. 

5. Lift the power supply tray to release it from the sheet metal and slide it out from 
the drawer. 

6. Tilt the tray to allow easier access to the fans. 

7. Remove the bracket holding the power harness as it passes from the power 
section to the system card cage section of the drawer. Remove the three Phillips 
head screws holding the bracket in place. 

8. Disconnect the power harness from the system motherboard and fold the harness 
back over the power supplies. Remove any modules that prevent you from 
disconnecting the harness from the system motherboard. 

CAUTION: Secure the power harness connectors in the system card cage to 
ensure that they cannot damage the pins in the CPU connectors. 

9. Disconnect the three power connectors from the PCI motherboard and pass them 
through the hole from the PCI card cage to the power section of the drawer. 

10. Disconnect the fan power cables from the power harness. 

11. Remove the four Phillips head screws holding the OCP tray to the system 
drawer. Slide the tray out of the system drawer far enough to disconnect power 
cables attached to the OCP, the floppy, and the CD-ROM drive. 

12. Remove the tray from the system. 

13. Release the three lever latches on the bracket holding all three fans in place. 

14. Disconnect the broken fan’s power cable from the power harness and lift the fan 
from the drawer. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the fan you installed is faulty, the system will not power up. 
Look at the PCM LEDs to determine that the fan you replaced is functioning 


properly. 
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7.23 Cover Interlock (4100 and early 4000) 
Removal and Replacement 


Figure 7-22 Removing Cover interlocks 
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Removal 


1. 


SU es eo SS 


Shut down the operating system and power down the system. 

Expose the system drawer. 

Remove all three section covers to expose the interlock switch assembly. 
Remove the two screws holding the interlock in place. 


Push the interlock toward the opposite side of the system drawer (be sure not to 
twist it) and tilt it so that the switches affected by the power and system card 
cage covers clear the openings in the side of the drawer. Slide it toward the 
front of the drawer and remove it, letting it hang loosely over the side of the 
drawer. 


If you are working on a pedestal system, disconnect the switch connection from 
the tray bulkhead and remove the interlock switch assembly. 


If you are working on a system drawer in a cabinet, unscrew the four screws 
holding the OCP tray assembly in place beneath the drawer in front. 


Slide the tray out and remove it from the system. 


Pull the interlock switch connection to the OCP back through the access hole 
and remove the entire switch assembly. 


Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the switch you installed is faulty, the system will not power 


up. 
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7.24 Cover Interlock (later 4000) Removal and 
Replacement 


Figure 7-23 Removing Cover interlocks 
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Removal 


1. 


SU es eo SS 


Shut down the operating system and power down the system. 

Expose the system drawer. 

Remove all three section covers to expose the interlock switch assemblies. 
Remove the two screws holding the interlocks in place. 


Push the interlock toward the opposite side of the system drawer (be sure not to 
twist it) and tilt it so that the switches affected by the power and system card 
cage covers clear the openings in the side of the drawer. Slide it toward the 
front of the drawer and remove it, letting it hang loosely over the side of the 
drawer. 


If you are working on a pedestal system, disconnect the switch connection from 
the tray bulkhead and remove the interlock switch assembly. 


If you are working on a system drawer in a cabinet, unscrew the four screws 
holding the OCP tray assembly in place beneath the drawer in front. 


Slide the tray out and remove it from the system. 


Pull the interlock switch connection to the OCP back through the access hole 
and remove the entire switch assembly. 


Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the switch you installed is faulty, the system will not power 


up. 
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7.25 Operator Contol Panel Removal and 
Replacement (Cabinet) 


Figure 7-24 Removing OCP (Cabinet) 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. While you need not remove the tray containing the OCP, you do need to slide it 
forward to access the OCP retaining screws under the tray. The tray is attached 
to the power system section cover. To slide the tray forward: 


a. Remove the tray cover by loosening the retaining screws at the back of the 
tray and sliding it toward the back of the system. 


b. Disconnect the cables from the OCP, and any optional SCSI device in the 
tray from the bulkhead at the rear right of the tray. 


c. Unscrew the Phillips head retaining screw holding the bulkhead to the tray. 


d. Unscrew the two Phillips head retaining screws at the front of the system 
drawer and slide the tray forward. 


4. Remove the white power interconnect wire and the signal ribbon cable from the 
OCP. 


5. Remove the two Phillips head screws holding the OCP in place and remove it 
from the tray. 
Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the OCP you installed is faulty, the system will not power 
up. 
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7.26 Operator Control Panel Removal and 
Replacement (Pedestal) 


Figure 7-25 Removing OCP (Pedestal) 
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Removal 


1. Shut down the operating system and power down the system. 

2. Expose the system drawer. 

3. Remove the four Phillips head screws holding the OCP tray to the system 
drawer. 

4. Slide the tray out of the system drawer far enough to disconnect cables attached 
to the OCP, the floppy, and the CD-ROM drive. 

5. Remove the tray from the system. 

6. Move the tray to some handy work surface. Hold the tray vertically and remove 
the two Phillips head screws that hold the OCP in place from the bottom of the 
tray and remove the OCP assembly from the tray. 

Replacement 


Reverse the steps in the Removal procedure. As you replace the tray in the drawer, 
be sure that the slides on the sides of the tray are placed on the rails in the drawer. 


Verification 


Power up the system. If the OCP you installed is faulty, the system will not power 
up or you will not see messages on the OCP display. 
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7.27 Hoppy Removal and Replacement 


Figure 7-26 Removing Hoppy Drive 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Remove the four Phillips head screws holding the OCP tray to the system 
drawer. 


4. Slide the tray out of the system drawer and disconnect cables attached to the 
OCP (unnecessary on a pedestal system), the floppy, and the CD-ROM drive. 
(In the pedestal system the OCP is in the tray above the power supplies.) 


5. Move the tray to some handy work surface. Hold the tray vertically and from 
the bottom of the tray remove the four Phillips head screws that hold the floppy 
in place and remove it from the tray. 

Replacement 

Reverse the steps in the Removal procedure. As you replace the tray in the drawer, 

be sure that the slides on the sides of the tray are placed on the rails in the drawer. 

Verification 

Power up the system. Use the following SRM console commands to test the floppy: 


POO>>> show dev floppy 
POO0>>> HD buf/dva0 
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7.28 CD-ROM Removal and Replacement 


Figure 7-27 Removing CD-ROM 
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Removal 
1. Shut down the operating system and power down the system. 
2. Expose the system drawer. 


3. Remove the four Phillips head screws holding the OCP tray to the system 
drawer. 


4. Slide the tray out of the system drawer and disconnect cables attached to the 
OCP (unnecessary on a pedestal system), the floppy, and the CD-ROM drive. 
(In the pedestal system the OCP is in the pedestal tray above the power 
supplies.) 


5. Move the tray to some handy work surface. Hold the tray vertically and from 
the bottom of the tray remove the four Phillips head screws that hold the floppy 
in place and remove it from the tray. 


Replacement 
Reverse the steps in the Removal procedure. As you replace the tray in the drawer, 
be sure that the slides on the sides of the tray are placed on the rails in the drawer. 


Verification 


Power up the system (press the Halt button if necessary to bring up the SRM 
console). Use the following SRM console commands to test the CD-ROM: 


POO>>> show dev ncrO 
POO>>> HD buf/dka nnn 


where nnn is the device number; for example, dka500. 
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7.29 Cabinet Fan Tray Removal and Replacement 


Figure 7-28 Removing Cabinet Fan Tray 
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Removal 


1. Shut down the operating system and power down the system. Unplug the AC 
power cable from the cabinet tray power supply. 


2. If present, unplug any power cables going to the server control modules at the 
back of system drawers. 


3. Unscrew the four Phillips head screws securing the fan tray to the top of the 
cabinet. 


4. Loosen the four hexnuts that hold the tray to the top of the cabinet. 


5. Holding the bottom of the tray, slide it out so that the holes in the tray frame slip 
over the loosened hexnuts. 


6. Move the tray to a work surface to remove whatever component is being 
replaced. 
Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. If the green power LED comes on, and the fan LED is off, the 
cabinet fan tray is verified. 
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7.30 Cabinet Fan Tray Power Supply Removal and 
Replacement 


Figure 7-29 Removing Cabinet Fan Tray Power Supply 
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Removal 
1. Remove the cabinet fan tray. 
2. Disconnect the power harness from the fan fail detect module and each fan. 


3. Remove the power supply cover. It is held in place by two screws that go 
through the AC bulkhead spot welded to the tray weldment. 


4. Remove the power harness from the tray by disconnecting it from the power 
supply. 

5. Disconnect the neutral and load leads from the power supply. 

6. Remove the four screws holding the power supply to the tray. Keep track of the 
standoffs that provide space between the power supply and weldment. You will 
need them during replacement. 

Replacement 

1. Reverse the steps in the Removal procedure. 


2. Place the fan tray back in the cabinet. 


Verification 


Power up the system. If the green power LED comes on, and the fan LED is off, the 
cabinet fan tray power supply is verified. 


Removaland Replacement 7-65 


Replacement 


7.31 Cabinet Fan Tray Fan Removal and 


Figure 7-30 Removing Cabinet Fan Tray Fan 
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Removal 


1. Remove the cabinet fan tray. 

2. Disconnect the power harness from the fan you wish to replace. 

3. Remove the fan finger guard. 

4. Remove the two remaining screws holding the fan to the tray and remove the 
fan. 

5. If the new fan does not have clip nuts, remove them from the fan. 

Replacement 

1. Reverse the Removal procedure, taking care to orient the fan so that the 
connection to the power harness is dressed nicely. 

2. Place the fan tray back in the cabinet. 

Verification 


Power up the system. If the green power LED comes on, and the fan LED 1s off, the 
cabinet fan tray fan is verified. 


Removaland Replacement 7-67 


7.32 Cabinet Fan Tray Fan Fail Detect Module 
Removal and Replacement 


Figure 7-31 Removing Fan Tray Fan Fail Detect Module 


PKW0441D-96 


7-68 AlphaServer 4000/4100 Service Manual 


Removal 

1. Remove the cabinet fan tray. 

2. Disconnect the power harness from the fan fail detect module. 

3. Remove the fan fail detect module. In early systems, the module is held in 
place by three screws that go through the weldment, through three standoffs, 
through the module to nuts. In later systems, the module snaps in place. 

Replacement 

1. Reverse the steps in the Removal procedure. 


2. Place the fan tray back in the cabinet. 


Verification 


Power up the system. If the green power LED comes on, and the fan LED is off, the 
cabinet fan fail detect module is verified. 


Removaland Replacement 7-69 


7.33 StorageWorks Shelf Removal and 
Replacement 


Figure 7-32 Removing StorageWorks Shelf 


Cabinet 


StorageWorks Shelf 
Mounting Rails a | | 
(H910A-EC) ane StorageWorks Shelf 
: : Mounting Rails 
(H910A-EB) 


Pedestal 
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Removal 


1. Shut down the operating system and power down the system. 

2. Remove the power cord and signal cord(s) from the StorageWorks shelf. 

3. Remove the two retaining brackets holding the shelf in the mounting rail by 
removing the Phillips head screws holding the brackets in place. 

4. Slide the shelf out of the system. 

Replacement 


Reverse the steps in the Removal procedure. 


Verification 


Power up the system. Use the show device console command to verify that the 
StorageWorks shelf is configured into the system. 


Removaland Replacement 7-71 


Appendix A 
Running Utlities 


This appendix provides a brief overview of how to load and run utilities. The 
following topics are covered: 


Running Utilities from a Graphics Monitor 
Running Utilities from a Serial Terminal 
Running ECU 

Running RAID Standalone Configuration Utility 
Updating Firmware with LFU 

Updating Firmware from AlphaBIOS 
Upgrading AlphaBIOS 
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A.1_ Running Utilities from a Graphics Monitor 


Start AlphaBIOS and select Utilities from the menu. The next selection depends 
on the utility to be run. For example, to run ECU, select Run ECU from floppy. 
To run RCU, select Run Maintenance Program. 


Figure A-1 Running a Utility from a Graphics Monitor 


AlphaBIOS Setup F1=Help 


Display System Configuration... 
Upgrade AlphaBIOS 
Hard Disk Setup... 


CMOS Setup... 

Install Windows NT 

Utilities —| Run ECU from floppy... 
About AlphaBIOS... OS Selection Setup... 


Run Maintenance Program... 
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A.2 Running Utilities from a Senal Terminal 


Utilities are run from a serial terminal in the same way as from a graphics 
monitor. The menus are the same, but some keys are different. 


Table A-1 AlphaBlOS Option Key Mapping 


AlphaBlOS Key VEIox Key 
Fl | Ctrl/A 
F2 Ctrl/B 
F3 Ctrl/C 
F4 Ctrl/D 
F5 Ctrl/E 
F6 Ctrl/F 
F7 Ctrl/P 
F8 Ctrl/R 
F9 Ctrl/T 
F10 Ctrl/U 
Insert Ctrl/V 
Delete Ctrl/W 
Backspace Ctrl/H 
Escape Ctrl/[ 
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A.3 Running ECU 


The EISA Configuration Utility (ECU) is used to configure EISA options on 
AlphaServer systems. The ECU can be run either from a graphics monitor or a 
serial terminal. 


1. Start AlphaBIOS Setup. If the system is in the SRM console, issue the 
command alphabios. (If the system has a graphics monitor, you can set the 
SRM console environment variable to graphics.) 


2. From AlphaBIOS Setup, select Utilities, then select Run ECU from floppy... 
from the submenu that displays, and press Enter. 


NOTE: The EISA Configuration Utility is supplied on diskettes shipped with the 
system. There is a diskette for Microsoft Windows NT and a diskette for 
DIGITAL UNIX and OpenVMS. 


3. Insert the correct ECU diskette for the operating system and press Enter to run 
it. 
The ECU main menu displays the following options: 


EISA Configuration Utility 
Steps in configuring your computer 


STEP 1 Important EISA configuration information 
STEP 2: Add or remove boards 
STEP 3: View or edit details 

4 

5 


STEP Examine required details 
STEP Save and exit 


NOTE: Step I of the ECU provides online help. It is recommended that you select 
this step and become familiar with the utility before proceeding. 
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A4 


Running RAID Standalone C onfiguration Utility 


The RAID Standalone Configuration Utility is used to set up RAID disk drives 
and logical units. The Standalone Utility is run from the AlphaBIOS Utility 


menu. 


The AlphaServer 4100 system supports the KZPSC-xx PCI RAID controller 
(SWXCR). The KZPSC-xx kit includes the controller, RAID Array 230 Subsystems 
software, and documentation. 


1. Start AlphaBIOS Setup. If the system is in the SRM console, issue the 
command alphabios. (If the system has a graphics monitor, you can set the 


SRM 


console environment variable to graphics.) 


At the Utilities screen, select Run Maintenance Program. Press Enter. 


3. In the Run Maintenance Program dialog box, type swxcrmgr in the Program 
Name: field. 


4. Press 


Enter to execute the program. The Main menu displays the following 


options: 


[O1. 
02 
03 
04. 
05 
06. 
07 
08. 
09. 
10 


View/Update Configuration] 


-Automatic Configuration 
-New Configuration 


Initialize Logical Drive 


-Parity Check 


Rebuild 


-Tools 


Select SWXCR 
Controller Setup 


-Diagnostics 


Refer to the RAID Array Subsystems documentation for information on using the 
Standalone Configuration Utility to set up RAID drives. 
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A.5 Updating Firmware with LFU 


Start the Loadable Firmware Update (LFU) utility by issuing the Ifu command 
at the SRM console prompt or by selecting Update AlphaBIOS in the 
AlphaBIOS Setup screen. LFU is part of the SRM console. 


Example A-1 Starting LFU from the SRM Console 
POO>>> lfu 
x*x*xx*x* Toadable Firmware Update Utility ***** 


Select firmware load device (cda0, dva0, ewa0), or 
Press <return> to bypass loading and proceed to LFU: cda0 


UPD> 


Figure A-2 Starting LFU from the AlphaBlOS Console 


AlphaBIOS Setup 


Display System Configuration... 
Hard Disk Setup 

CMOS Setup... 

Install Windows NT 

Utilities > 
About AlphaBIOS... 


Press ENTER to upgrade your AlphaBIOS from floppy or CD-ROM. 


ESC=Exit 
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Use the Loadable Firmware Update (LFU) utility to update system firmware. 
You can start LFU from either the SRM console or the AlphaBIOS console. 


e From the SRM console, start LFU by issuing the Ifu command. 


e From the AlphaBIOS console, select Upgrade AlphaBIOS from the 
AlphaBIOS Setup screen (see Figure A-2). 


A typical update procedure is: 
1. Start LFU. 


2. Use the LFU list command to show the revisions of modules that LFU can 
update and the revisions of update firmware. 


3. Use the LFU update command to write the new firmware. 
4. Use the LFU exit command to exit back to the console. 


The sections that follow show examples of updating firmware from the local CD- 
ROM, the local floppy, and a network device. Following the examples is an LFU 
command reference. 
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A.5.1 Updating Finmware from the Intemal CD-ROM 


Insert the update CD-ROM, start LFU, and select cda0 as the load device. 


Example A-2 Updating firmware from the Intemal CD-ROM 
xxx*x*x Toadable Firmware Update Utility ***** 


Select firmware load device (cda0, dva0, ewa0), or 
Press <return> to bypass loading and proceed to LFU: cda0 1) 


Please enter the name of the options firmware files list, or 
Press <return> to use the default filename [AS4X00FW]: As4xo0ocP @ 


Copying AS4X00CP from DKA500.5.0.1.1 . 


Copying [as4x00]RHREADME from DKA500.5.0.1.1 . 

Copying [as4x00]RHSRMROM from DKA500.5.0.1.1 ..............--.46- 

Copying [as4x00]RHARCROM from DKA500.5.0.1.1 ............. 

Function Description 13) 

Display Displays the system’s configuration table. 

Exit Done exit LFU (reset). 

List Lists the device, revision, firmware name, and 
update revision. 

Lfu Restarts LFU. 

Readme Lists important release information. 

Update Replaces current firmware with loadable data image. 

Verify Compares loadable and hardware images. 


? or Help Scrolls this function table. 


UPD> list 4) 
Device Current Revision Filename Update Revision 
AlphaBIOS V5.12-2 arcrom v6.40-1 
srmflash V1.0-9 srmrom V2.0-3 


Continued on next page 
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Select the device from which firmware will be loaded. The choices are the 
internal CD-ROM, the internal floppy disk, or a network device. In this 
example, the internal CD-ROM is selected. 


Select the file that has the firmware update, or press Enter to select the default 
file. The file options are: 


AS4X00FW SRM console, AlphaBIOS console, and I/O adapter 
(default) firmware 

AS4X00CP SRM console and AlphaBIOS console firmware only 
AS4X001O I/O adapter firmware only 


In this example the file for console firmware ( AlphaBIOS and SRM) is 
selected. 


The LFU function table and prompt (UPD>) display. 


Use the LFU list command to determine the revision of firmware in a device 
and the most recent revision of that firmware available in the selected file. In 
this example, the resident firmware for each console (SRM and AlphaBIOS) is 
at an earlier revision than the firmware in the update file. 


Continued on next page 
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Example A-2 Updating Firmware from the Intemal CD-ROM 
(Continued) 


UPD> update * (5) 
WARNING: updates may take several minutes to complete for each 
device. 


Confirm update on: AlphaBIOS [Y/(N)] y 6] 
DO NOT ABORT! 
AlphaBIOS Updating to V6.40-1... Verifying V6.40-1... PASSED. 
Confirm update on: srmflash [Y/(N)] y 
DO NOT ABORT! 
srmflash Updating to V2.0-3... Verifying V2.0-3... PASSED. 
UPD> exit (7) 
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The update command updates the device specified or all devices. In this 
example, the wildcard indicates that all devices supported by the selected 
update file will be updated. 


For each device, you are asked to confirm that you want to update the 
firmware. The default is no. Once the update begins, do not abort the 
operation. Doing so will corrupt the firmware on the module. 


The exit command returns you to the console from which you entered LFU 
(either SRM or AlphaBIOS). 
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A.5.2 Updating Firmware from the Intemal Hoppy Disk — 
Creating the Diskettes 


Create the update diskettes before starting LFU. See Section A.5.3 for an 
example of the update procedure. 


Table A-2 File Locations for Creating Update Disketies on a PC 


Console Update Diskette I/O Update Diskette 
AS4X00FW.TXT | AS4X001O.TXT 
AS4X00CP.TXT RHREADME.SYS 
RHREADME.SYS CIPCA214.SYS 
RHSRMROM.SYS DFPAA246.SYS 


RHARCROM.SYS KZPAAAI0.SYS 


To update system firmware from floppy disk, you first must create the firmware 
update diskettes. You will need to create two diskettes: one for console updates, and 
one for I/O. 


1. Download the update files from the Internet (see the Preface of this book). 
2. Ona PC, copy files onto two FAT-formatted diskettes. 


From an OpenVMS system, copy files onto two ODS2-formatted diskettes as 
shown in Example A-3. 
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Example A-3 Creating Update Diskettes on an OpenVMS System 


Console Update Diskette 


COpy 
COpy 
COpy 
COpy 
COpy 
COpy 
COpy 


MNANAUMMNNNNNNUNNNN 


§ set noverify 


S exit 


sys 
sys 
sys 
txt 
txt 
sys 
sys 


I/O Update Diskette 


COpy 
COpy 
COpy 
COpy 
COpy 
COpy 
COpy 
COpy 


MNNAMNMNNANUNNNANUNNUNUNN 


dva0: [as4x00] 

dva0: [as4x00]as4x00fw.sys 
dva0: [as4x00]as4x00cp.sys 
dva0: [as4x00] rhreadme.sys 
dva0: [as4x00]as4x00fw.txt 
dva0: [as4x00]as4x00cp.txt 
dva0: [as4x00] rhsrmrom. sys 
dva0: [as4x00] rharcrom.sys 


§ set noverify 


S exit 


sys 
sys 
sys 
txt 
txt 
sys 
sys 
sys 


dva0: [as4x00]as4x00fw.sys 
dva0: [as4x00]as4x00io.sys 
dva0: [as4x00] rhreadme.sys 
dva0: [as4x00]as4x00fw.txt 
dva0: [as4x00]as4x00io.txt 


dva0: [options] cipca214.sys 
dva0: [options] dfpaa246.sys 
dva0: [options]kzpsaal0.sys 


inquire ignore "Insert blank HD floppy in DVAO, then continue" 
set verify 
set proc/priv=all 
init /density=hd/index=begin dva0: rhods2cp 
mount dva0: rhods2cpo 

create /directory 
as4x00fw. 
as4x00cp. 
rhreadme. 
as4x00fw. 
as4x00cp. 
rhsrmrom. 
rharcrom. 
$ dismount dva0: 


inquire ignore "Insert blank HD floppy in DVAO, then continue" 
set verify 
set proc/priv=all 
init /density=hd/index=begin dva0: rhods2io 
mount dva0: rhods2io 

create /directory dva0: [as4x00] 

create /directory dva0: [options] 

as4x00fw. 
as4x00io. 
rhreadme. 
as4x00fw. 
as4x00io. 
cipca214. 
dfpaa246. 
kzpsaA10. 
$ dismount dva0: 
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A.5.3. Updating Firmware from the Intemal Hoppy Disk — 
Performing the Update 


Insert an update diskette (see Section A.5.2) into the internal floppy drive. Start 
LFU and select dva0 as the load device. 


Example A-4 Updating Firmware from the Intemal Hoppy Disk 
**x*x*x*x Toadable Firmware Update Utility ***** 


Select firmware load device (cda0, dva0, ewa0), or 
Press <return> to bypass loading and proceed to LFU: dva0 1) 


Please enter the name of the options firmware files list, or 
Press <return> to use the default filename [AS4X00IO, (AS4X00CP) ]: 
AS4X00IO 2) 


Copying AS4X00IO from DVAO . 
Copying RHREADME from DVAO . 
Copying CIPCA214 from DVAO . 
Copying DFPAA252 from DVAO ... 
Copying KZPSAA11 from DVAO ... 


(The function table displays, followed by the UPD> prompt, as 
shown in Example A-2.) 


UPD> list t3 
Device Current Revision Filename Update Revision 
AlphaBIOS V5.12-3 arcrom Missing file 
pfid 2.46 dfpaa_fw 2.52 
srmflash T3.2-21 srmrom Missing file 
cipca_fw A214 
kzpsa_fw All 


Continued on next page 
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Select the device from which firmware will be loaded. The choices are the 
internal CD-ROM, the internal floppy disk, or a network device. In this 
example, the internal floppy disk is selected. 


Select the file that has the firmware update, or press Enter to select the default 
file. When the internal floppy disk is the load device, the file options are: 


AS4X00CP (default) SRM console and AlphaBIOS console firmware only 
AS4X00I1O I/O adapter firmware only 


The default option in Example A-2 (AS4X00FW) is not available, since the file 
is too large to fit on a 1.44 MB diskette. This means that when a floppy disk is 
the load device, you can update either console firmware or I/O adapter 
firmware, but not both in the same LFU session. If you need to update both, 
after finishing the first update, restart LFU with the Ifu command and insert the 
floppy disk with the other file. 


In this example the file for I/O adapter firmware is selected. 


Use the LFU list command to determine the revision of firmware in a device 
and the most recent revision of that firmware available in the selected file. In 
this example, the update revision for console firmware displays as “Missing 
file” because only the I/O firmware files are available on the floppy disk. 


Continued on next page 
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Example A-4 Updating Firmware from the Intemal Hoppy 
Disk(C ontinued) 


UPD> update pfid 14) 
WARNING: updates may take several minutes to complete for each 
device. 


Confirm update on: pfid [Y/(N)] y 5] 
DO NOT ABORT! 

pfid Updating to 2.52... Verifying to 2.52... PASSED. 

UPD> 1fu 16 


*xx*k*x*x Toadable Firmware Update Utility ***** 


Select firmware load device (cda0, dva0, ewa0), or 
Press <return> to bypass loading and proceed to LFU: dva0 


Please enter the name of the options firmware files list, or 
Press <return> to use the default filename [AS4X00IO, (AS4X00CP) ] :@ 


(The function table displays, followed by the UPD> prompt. 
Console firmware can now be updated.) 


UPD> exit 8] 
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@ The update command updates the device specified or all devices. 


©  Foreach device, you are asked to confirm that you want to update the 
firmware. The default is no. Once the update begins, do not abort the 
operation. Doing so will corrupt the firmware on the module. 


@ The Ifu command restarts the utility so that console firmware can be updated. 
(Another method is shown in Example A-5, where the user specifies the file 
AS4X00FW and is prompted to insert the second diskette.) 


@ = The default update file, AS4X00CP, is selected. The console firmware can now 
be updated, using the same procedure as for the I/O firmware. 


© The exit command returns you to the console from which you entered LFU 
(either SRM or AlphaBIOS). 


Example A-5 Selecting AS4XO0RW to Update Firmware from the 
Intemal Hoppy Disk 


POO>>> 1lfu 
*x*k*x*x Toadable Firmware Update Utility ***** 


Select firmware load device (cda0, dva0, ewa0), or 
Press <return> to bypass loading and proceed to LFU: dva0 


Please enter the name of the firmware files list, or 
Press <return> to use the default filename [AS4X00I0O, (AS4X00CP)]: as4x00fw 


Copying AS4X00FW from DVAO . 

Copying RHREADMF from DVAO . 

Copying RHSRMROM from DVAO .......... eee eee eee eee eee 
Copying RHARCROM from DVAO ............... 

Copying CIPCA214 from DVAO 

Please insert next floppy containing the firmware, 
Press <return> when ready. Or type DONE to abort. 
Copying CIPCA214 from DVAO . 

Copying DFPAA246 from DVAO ... 

Copying KZPSAA10 from DVAO ... 
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A.5.4 Updating Firmware from a Network Device 


Copy files to the local MOP server’s MOP load area, start LFU, and select ewa0 
as the load device. 


Example A-6 Updating Firmware from a Network Device 
*x*x*x*x*x TLoadable Firmware Update Utility ***** 


Select firmware load device (cda0, dva0, ewa0), or 
Press <return> to bypass loading and proceed to LFU: ewa0 1] 


Please enter the name of the options firmware files list, or 
Press <return> to use the default filename [AS4X00FW] : 2) 


Copying AS4X00FW from EWAO . 

Copying RHREADMF from EWAO . 

Copying RHSRMROM from EWAO ........ cee eee eee eee eee eee 
Copying RHARCROM from EWAO ............ 

Copying CIPCA214 from EWAO . 

Copying DFPAA246 from EWAO ... 

Copying KZPSAA11 from EWAO ... 


. [The function table displays, followed by the UPD> prompt, as 
shown in Example A-2.] 


UPD> list £3 
Device Current Revision Filename Update Revision 
AlphaBIOS V5.12-2 arcrom v6.40-1 
kzpsa0 A10 kzpsa_fw All 
kzpsal Al0 kzpsa_fw All 
srmflash V1.0-9 srmrom V2.0-3 
cipca_fw A214 
dfpaa_fw 2.46 


Continued on next page 
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Before starting LFU, download the update files from the Internet (see Preface). You 
will need the files with the extension .SYS. Copy these files to your local MOP 
server’s MOP load area. 


@ Select the device from which firmware will be loaded. The choices are the 
internal CD-ROM, the internal floppy disk, or a network device. In this 
example, a network device is selected. 


@ _ Select the file that has the firmware update, or press Enter to select the default 
file. The file options are: 


AS4X00FW SRM console, AlphaBIOS console, and I/O adapter 
(default) firmware 

AS4X00CP SRM console and AlphaBIOS console firmware only 
AS4X001O I/O adapter firmware only 


In this example the default file, which has both console firmware ( AlphaBIOS 
and SRM) and I/O adapter firmware, is selected. 


© Use the LFU list command to determine the revision of firmware in a device 
and the most recent revision of that firmware available in the selected file. In 
this example, the resident firmware for each console (SRM and AlphaBIOS) 
and I/O adapter is at an earlier revision than the firmware in the update file. 


Continued on next page 
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Example A-6 Updating Firmware from a Network Device 
(Continued) 


UPD> update * -all 14) 
WARNING: updates may take several minutes to complete for each 
device. 


DO NOT ABORT! 


AlphaBIOS Updating to V6.40-1... Verifying V6é.40-1... PASSED. 
DO NOT ABORT! 

kzpsa0 Updating to All ... Verifying All... PASSED. 
DO NOT ABORT! 

kzpsal Updating to All ... Verifying All... PASSED. 
DO NOT ABORT! 

srmflash Updating to V2.0-3... Verifying V2.0-3... PASSED. 

UPD> exit (5) 
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The update command updates the device specified or all devices. In this 
example, the wildcard indicates that all devices supported by the selected 
update file will be updated. Typically, LFU requests confirmation before 
updating each console’s or device’s firmware. The -all option removes the 
update confirmation requests. 


The exit command returns you to the console from which you entered LFU 
(either SRM or AlphaBIOS). 
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A5.5 LAU Commands 


The commands summarized in Table A-3 are used to update system firmware. 


Table A-3 LFUCommand Summary 


Command Function 

display | Shows the system physical configuration. 

exit Terminates the LFU program. 

help Displays the LFU command list. 

Ifu Restarts the LFU program. 

list Displays the inventory of update firmware on the selected device. 
readme Lists release notes for the LFU program. 

update Writes new firmware to the module. 

verify Reads the firmware from the module into memory and compares it 


with the update firmware. 


These commands are described in the following pages. 
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display 

The display command shows the system physical configuration. Display is 
equivalent to issuing the SRM console command show configuration. Because it 
shows the slot for each module, display can help you identify the location of a 
device. 


exit 
The exit command terminates the LFU program, causes system initialization and 
testing, and returns the system to the console from which LFU was called. 


help 


The help (or ?) command displays the LFU command list, shown below. 


Function Description 

Display Displays the system’s configuration table. 

Exit Done exit LFU (reset). 

List Lists the device, revision, firmware name, and update 
revision. 

Lfu Restarts LFU. 

Readme Lists important release information. 

Update Replaces current firmware with loadable data image. 

Verify Compares loadable and hardware images. 


? or Help Scrolls this function table. 


Hfu 


The Ifu command restarts the LFU program. This command is used when the update 
files are on a floppy disk. The files for updating both console firmware and I/O 
firmware are too large to fit on a 1.44 MB disk, so only one type of firmware can be 
updated at a time. Restarting LFU enables you to specify another update file. 
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list 
The list command displays the inventory of update firmware on the CD-ROM, 


network, or floppy. Only the devices listed at your terminal are supported for 
firmware updates. 


The list command shows three pieces of information for each device: 
e Current Revision — The revision of the device’s current firmware 
e Filename — The name of the file used to update that firmware 


e Update revision — The revision of the firmware update image 


readme 


The readme command lists release notes for the LFU program. 


update 


The update command writes new firmware to the module. Then LFU automatically 
verifies the update by reading the new firmware image from the module into 
memory and comparing it with the source image. 


To update more than one device, you may use a wildcard but not a list. For example, 
update k* updates all devices with names beginning with k, and update * updates 
all devices. When you do not specify a device name, LFU tries to update all devices; 
it lists the selected devices to update and prompts before devices are updated. (The 
default is no.) The -all option removes the update confirmation requests, enabling 
the update to proceed without operator intervention. 


CAUTION: Never abort an update operation. Aborting corrupts the firmware on the 
module. 


verify 

The verify command reads the firmware from the module into memory and 
compares it with the update firmware. If a module already verified successfully 
when you updated it, but later failed tests, you can use verify to tell whether the 
firmware has become corrupted. 
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A.6 Updating Firmware from AlphaBloS 


Insert the CD-ROM or diskette with the updated firmware and select Upgrade 
AlphaBIOS from the main AlphaBIOS Setup screen. Use the Loadable 
Firmware Update (LFU) utility to perform the update. The LFU exit command 
causes a system reset. 


Figure A-3 AlphaBiOS Setup Screen 


AlphaBIOS Setup 


Display System Configuration... 
Hard Disk Setup 
CMOS Setup... 
Install Windows NT 
Utilities 

About AlphaBIOS... 


Vv 


Press ENTER to upgrade your AlphaBIOS from floppy or CD-ROM. 


ESC=Exit 
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Upgrading AlphaBIOS 

As new versions of Windows NT are released, it might be necessary to upgrade 
AlphaBIOS to the latest version. Additionally, as improvements are made to 
AlphaBIOS, it might be desirable to upgrade to take advantage of new AlphaBIOS 
features. 


Use this procedure to upgrade from an earlier version of AlphaBIOS: 


1. 
2. 


Insert the diskette or CD-ROM containing the AlphaBIOS upgrade. 


If you are not already running AlphaBIOS Setup, start it by restarting your 
system and pressing F2 when the Boot screen is displayed. 


In the main AlphaBIOS Setup screen, select Upgrade AlphaBIOS and press 
Enter. 


The system is reset and the Loadable Firmware Update (LFU) utility is started. 
See Section A5.5 for LFU commands. 


When the upgrade is complete, issue the LFU exit command. The system is 
reset and you are returned to AlphaBIOS. 


If you press the Reset button instead of issuing the LFU exit command, the 
system is reset and you are returned to LFU. 


A-26 AlphaServer 4000/4100 Service Manual 


Appendix B 


SRM Console Commands and 
Environment Vanables 


This appendix provides a summary of the SRM console commands and environment 
variables. The test command is described in Chapter 3 of this document. For 
complete reference information on the other SRM commands and environment 
variables, see the AlphaServer 4000/4100 System Drawer User’s Guide. 


NOTE: It is recommended that you keep a list of the environment variable settings 
for systems that you service, because you will need to restore certain environment 
variable settings after swapping modules. Refer to Table B-3 for a convenient 
worksheet. 
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B.1 Summary of SRM Console Commands 


The SRM console commands are used to examine or modify the system state. 


Table B-1 Summary of SRM Console Commands 


Command Function 


alphabios Loads and starts the AlphaBIOS console. 

boot Loads and starts the operating system. 

clear envar Resets an environment variable to its default value. 

continue Resumes program execution. 

crash Forces a crash dump at the operating system level. 

deposit Writes data to the specified address. 

edit Invokes the console line editor on a RAM file or on the nvram file 


(power-up script). 


examine Displays the contents of a memory location, register, or device. 
halt Halts the specified processor. (Same as stop.) 

help Displays information about the specified console command. 
info num Displays various types of information about the system: 


Info shows a list describing the num qualifier. 


Info 3 reads the impure area that contains the state of the CPU 
before it entered PAL mode. 


Info 5 reads the PAL built logout area that contains the data used 
by the operating system to create the error entry 


Info 8 reads the IOD and IOD1 registers. 
initialize Resets the system. 


Ifu Runs the Loadable Firmware Update Utility. 


Continued on next page 
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Table B-1 Summary of SRM Console Commands (Continued) 


Command Function 

man | Displays information about the specified console command. 
more Displays a file one screen at a time. 

preache Initializes and displays status of the PCI NVRAM. 

set envar Sets or modifies the value of an environment variable. 

set host Connects to an MSCP DUP server on a DSSI device. 


set rem_dialout 


show envar 
show config 
show cpu 
show device 


show fru 


show memory 
show network 


show pal 


show power 


show 
rem_dialout 


show version 


start 


stop 
test 


Sets a modem dialout string. 

Displays the state of the specified environment variable. 
Displays the configuration at the last system initialization. 
Displays the state of each processor in the system. 

Displays a list of controllers and their devices in the system. 


Displays the serial number and revision level of system bus 
options. 


Displays memory module information. 
Displays the state of network devices in the system. 


Displays the version of the privileged architecture library code 
(PALcode). 


Displays information about the power supplies, system fans, 
CPU fans, and temperature. 


Displays the modem dialout string. 


Displays the version of the console program. 


Starts a program that was previously loaded on the processor 
specified. 


Halts the specified processor. (Same as halt.) 


Runs firmware diagnostics for the system. 
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B.2 Summary of SRM Environment Vanables 


Environment variables pass configuration information between the console and 
the operating system. Their settings determine how the system powers up, boots 
the operating system, and operates. Environment variables are set or changed 
with the set envar command and returned to their default values with the clear 
envar command. Their values are viewed with the show envar command. The 
SRM environment variables are specific to the SRM console. 


Table B-2 Environment Variable Summary 


Environment 

Variable Function 

auto action Specifies the console’s action at power-up, a failure, or a 

- reset. 

bootdef_dev Specifies the default boot device string. 

boot_osflags Specifies the default operating system boot flags. 

com2_baud Changes the default baud rate of the COM2 serial port. 

console Specifies the device on which power-up output is displayed 
(serial terminal or graphics monitor). 

cpu_enabled Enables or disables a specific secondary CPU. 

ew*0_mode Specifies the connection type of the default Ethernet 
controller. 

ew*0_protocols Specifies network protocols for booting over the Ethernet 
controller. 


kbd_hardware_ Specifies the default console keyboard type. 
type 


kzpsa*_host_id Specifies the default value for the KZPSA host SCSI bus 
node ID. 


language Specifies the console keyboard layout. 


Continued on next page 
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Table B-2 Environment Variable Summary (Continued) 


Environment 
Variable 


Function 


memory_test 


ocp_text 


os_type 


pci_parity 
pk*0_fast 
pk*0_host_id 
pk*0_soft_term 


sys_model_num 


sys_serial_num 


sys_type 


tga_sync_green 


tt_allow_login 


Specifies the extent to which memory will be tested. For 
DIGITAL UNIX systems only. 


Overrides the default OCP display text with specified text. 


Specifies the operating system and sets the appropriate 
console interface. 


Disables or enables parity checking on the PCI bus. 
Enables fast SCSI mode. 
Specifies the default value for a controller host bus node ID. 


Enables or disables SCSI terminators on systems that use the 
QLogic ISP1020 SCSI controller. 


Displays the system model number and computes certain 
information passed to the operating system. Must be restored 
after a PCI motherboard is replaced. 


Restores the system serial number. Must be set if the system 
motherboard is replaced. 


Displays the system type and computes certain information 
passed to the operating system. Must be restored after a PCI 
motherboard is replaced. 


Specifies the location of the SYNC signal generated by the 
DIGITAL ZLXp-E PCI graphics accelerator option. 


Enables or disables login to the SRM console firmware on 


other console ports. 
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B.3 Recording Environment Variables 


You can make copies of the table below to record environment variable settings 
for specific systems. Write the system name in the column provided. Enter the 
show* command to list the system settings. 


Table B-3 Environment Variables Worksheet 


Environment 
Variable 


System Name 


System Name 


System Name 


auto_action 


bootdef_dev 


boot_osflags 


com2_baud 


console 


cpu_enabled 


ew*0_mode 


ew*0_protocols 


kbd_hardware_ 
type 


kzpsa*_host_id 


language 


memory_test 


ocp_text 


os_type 


pcei_parity 


pk*0_fast 


pk*0_host_id 


pk*0_soft_term 
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Table B-3 Environment Variables Worksheet (Continued) 


Environment 
Variable 


System Name 


System Name 


System Name 


pk*0_soft_term 


sys_model_num 


sys_serial_num 


sys_type 


tga_sync_green 


tt_allow_login 
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Appendix C 
Operating the System Remotely 


This appendix describes how to use the remote console monitor (RCM) to monitor 
and control the system remotely . 


C.1 RCM Console Overview 


The remote console monitor (RCM) is used to monitor and control the system 
remotely. The RCM resides on the server control module and allows the system 
administrator to connect remotely to a managed system through a modem, using a 
serial terminal or terminal emulator. 


The RCM has special console firmware that is used to remotely control an 
AlphaServer system. The RCM firmware resides on an independent microprocessor. 
It is not part of the SRM console that resides in the flash ROM. The RCM firmware 
has its own command interface that allows the user to perform the tasks that can 
usually be done from the system’s serial console terminal. RCM console commands 
are used to reset, halt, and power the system on or off, regardless of the operating 
system or hardware state. The RCM console commands are also used to monitor the 
power supplies, temperature, and fans. 


The user can enter the RCM console either remotely or through the local serial 
console terminal. Once in command mode, the user can enter commands to control 
and monitor the system. 


e To enter the RCM console remotely, the user dials in through a modem, enters a 
password, and then types a special escape sequence that invokes RCM command 
mode. You must set up the modem before you can dial in remotely. See 
Section C.1.1. 


e To enter the RCM console locally, the user types the escape sequence at the 
SRM console prompt on the local serial console terminal. 


The RCM also provides an autonomous dial-out capability when it detects a power 
failure within the system. When triggered, the RCM dials a paging service at 30- 
minute intervals until the administrator clears the alert within the RCM. 
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C.1.1 Modem Usage 


To use the RCM to monitor a system remotely, first make the connections to the 
server control module, as shown below. Then configure the modem port for 
dial-in. 


Figure C-1 RCM Connections 


ConsoleTerminal 


Modem 


PK-0651-96 
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Modem Selection 


The RCM requires a Hayes-compatible modem. The controls that the RCM sends to 
the modem have been selected to be acceptable to a wide selection of modems. The 
modems that have been tested and qualified include: 


Motorola LifeStyle Series 28.8 
AT&T DATAPORT 14.4/FAX 
Zoom Model 360 


The U.S. Robotics Sportster DATA/FAX MODEM is also supported, but requires 
some modification of the modem initialization and answer strings. See Section 
C.1.7. 

Modem Configuration Procedure 


1. Connect a Hayes-compatible modem to the RCM as shown in Figure C-1, and 
power up the modem. 


2. From the local serial console terminal, enter the RCM firmware console by 
typing the following escape sequence: 


“]*)] rem 


The character “”” is created by simultaneously holding down the Ctrl key and 
pressing the ] key (right square bracket). The firmware prompt, RCM>, should 
now be displayed. 


Enter a modem password with the setpass command. See Section C.1.3.14. 
Enable the modem port with the enable command. See Section C.1.3.5. 


Enter the quit command to leave the RCM console. 


ON EO 08 


You are now ready to dial in remotely. 


Operating the System Remotely C-3 


Dialing In to the RCM Modem Port 


1. Dial the modem connected to the server control module. The RCM answers the 
call and after a few seconds prompts for a password with a “#”’ character. 


2. Enter the password that was loaded using the setpass command. The user has 
three tries to correctly enter the password. On the third unsuccessful attempt, 
the connection is terminated, and as a security precaution, the modem is not 
answered again for 5 minutes. 


On successful entry of the password, the RCM banner message “RCM V1.0” is 
displayed, and the user is connected to the system COM] port. At this point the 
local terminal keyboard is disabled except for entering the RCM console 
firmware. The local terminal displays all the terminal traffic going out to the 
modem. 


3. Toconnect to the RCM firmware console, type the RCM escape sequence. 
Refer to Example C-1 for an example of the modem dial-in procedure. 


Example C-1 Sample Remote Dial-In Dialog 


ATQOV1E1S0=0 When modem dial-in connection is made, a screen display 


OK similar to this appears. 

ATDT30167 

CONNECT 9600 

# Enter password at this prompt. 

RCM V1.0 RCM banner is displayed. 

“]*] rem Enter the escape sequence after the banner is displayed. 
The escape sequence is not echoed on the terminal. 

RCM> RCM prompt is displayed. Commands to control and 


monitor the system can be entered. 


Terminating a Modem Session 


Terminate the modem session by executing a hangup command from the RCM 
console firmware. This will cleanly terminate the modem connection. 


If the modem connection is terminated without using the hangup command, or if the 
line is dropped due to phone line problems, the RCM will detect carrier loss and 
initiate an internal hangup command. This process can take a minute or more, and 
the local terminal will be locked out until the auto hangup process completes. 


If the modem link is idle for more than 20 minutes, the RCM initiates an auto 
hangup. 
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C.1.2 Entering and Leaving Command Mode 


Use the default escape sequence to enter RCM command mode for the first time. 
You can enter RCM command mode from the SRM console level, the operating 
system level, or an application. The RCM quit command reconnects the 
terminal to the system console port. 


Example C-2 Entering and Leaving RCM Command Mode 


“| “)]eem 1] 
RCM> 
RCM> quit (2) 


Focus returned to COM port 


Entering the RCM Firmware Console 


To enter the RCM firmware console, enter the RCM escape sequence. See @ in 
Example C-2 for the default sequence. 


The escape sequence is not echoed on the terminal or sent to the system. Once in the 
RCM firmware console, the user is in RCM command mode and can enter RCM 
console commands. 


Leaving Command Mode 


To leave RCM command mode and reconnect to the system console port, enter the 
quit command, then press Return to get a prompt from the operating system or 
system console. (See @). 
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C.L3 RCM Commands 


The RCM commands summarized below are used to control and monitor a 
system remotely. 


Table C-1 RCM Command Summary 


Command Function 

alert_clr Clears alert flag, stopping dial-out alert cycle 
alert_dis Disables the dial-out alert function 

alert_ena Enables the dial-out alert function 

disable Disables remote access to the modem port 

enable Enables remote access to the modem port 

hangup Terminates the modem connection 

halt Halts server 

help or ? Displays the list of commands 

poweroff Turns off power to server 

poweron Turns on power to server 

quit Exits console mode and returns to system console port 
reset Resets the server 

setesc Changes the escape sequence for entering command mode 
setpass Changes the modem access password 

status Displays server’s status and sensors 
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Command Conventions 


e The commands are not case sensitive. 
e Acommand must be entered in full. 
e If acommand is entered that is not valid, the command fails with the message: 


*** ERROR - unknown command *** 


Enter a valid command. 


The RCM commands are described on the following pages. 
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C.1.3.1 alert.clr 


The alert_clr command clears an alert condition within the RCM. The alert enable 
condition remains active, and the RCM will again enter the alert condition when it 
detects a system power failure. 


RCM>alert_clr 


C.1.3.2 alert dis 


The alert_dis command disables RCM dial-out capability. It also clears any 
outstanding alerts. The alert disable state is nonvolatile. Dial-out capability remains 
disabled until the alert_enable command is issued. 


RCM>alert_dis 


C.1.3.3 alert_ena 


The alert_ena command enables the RCM to automatically dial out when it detects 
a power failure within the system. The RCM repeats the dial-out alert at 30-minute 
intervals until the alert is cleared. The alert enable state is nonvolatile. Dial-out 
capability remains enabled until the alert_disable command is issued. 


RCM>alert_ena 


In order for the alert_enable command to work, two conditions must be met: 
e A modem dial-out string must be entered with the system console. 


e Remote access to the RCM modem port must be enabled with the enable 
command. 


If the alert_enable command is entered when remote access is disabled, the 
following message is returned: 


KKK error KkK* 


C-8 AlphaServer 4000/4100 Service Manual 


C.1.3.4 disable 


The disable command disables remote access to the RCM modem port. 


RCM>disable 


The module’s remote access default state is DISABLED. The modem enable state is 


nonvolatile. When the modem is disabled, it remains disabled until the enable 
command is issued. If a modem connection is in progress, entering the disable 
command terminates it. 


C.13.5 enable 


The enable command enables remote access to the RCM modem port. It can take up 


to 10 seconds for the enable command to be executed. 


RCM>enable 


The module’s remote access default state is DISABLED. 


The modem enable state is nonvolatile. When the modem is enabled, it remains 
enabled until the disable command is issued. 


The enable command can fail for two reasons: 
e There is no modem access password configured. 
e The modem is not connected or is not working properly. 


If the enable command fails, the following message is displayed: 
*** ERROR enable failed *** 


C.1.3.6 hangup 


The hangup command terminates the modem session. When this command is 
issued, the remote user is disconnected from the server. This command can be 
issued from either the local or remote console. 


RCM>hangup 
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C.1.3.7 halt 


The halt command attempts to halt the managed system. It is functionally 
equivalent to pressing the Halt button on the system operator control panel to the 
“in” position and then releasing it to the “out” position. The RCM console firmware 
exits command mode and reconnects the user’s terminal to the server’s COM serial 
port. 


RCM>halt 
Focus returned to COM port 


NOTE: Pressing the Halt button has no effect on systems running Windows NT. 


C.1.3.8 help or? 


The help or ? command displays the RCM firmware command set. 


C.1.3.9 poweroff 


The poweroff command requests the RCM module to power off the system. It is 
functionally equivalent to turning off the system power from the operator control 
panel. 


RCM>poweroff 


If the system is already powered off, this command has no effect. 


The external power to the RCM must be connected in order to power off the system 
from the RCM firmware console. If the external power supply is not connected, the 
command will not power the system down, and displays the message: 


KkK* ERROR KkK* 
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C.1.3.10 poweron 


The poweron command requests the RCM module to power on the system. For the 
system power to come on, the following conditions must be met: 


e AC power must be present at the power supply inputs. 
e The DC On/Off button must be in the “on” position. 
e All system interlocks must be set correctly. 


The RCM firmware console exits command mode and reconnects the user’s terminal 
to the system console port. 


RCM>poweron 
Focus returned to COM port 


NOTE: If the system is powered off with the DC On/Off button, the system will not 
power up. The RCM will not override the “off” state of the DC On/Off button. If the 
system is already powered on, the poweron command has no effect. 


C.1.3.11 quit 


The quit command exits the user from command mode and reconnects the user’s 
terminal to the system console port. The following message is displayed: 


Focus returned to COM port 


The next display depends on what the system was doing when the RCM was 
invoked. For example, if the RCM was invoked from the SRM console prompt, the 
console prompt will be displayed when you enter a carriage return. Or, if the RCM 
was invoked from the operating system prompt, the operating system prompt will be 
displayed when you enter a carriage return. 
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C.1.3.12 reset 


The reset command requests the RCM module to perform a hardware reset. It is 
functionally equivalent to pressing the Reset button on the system operator control 
panel. 


RCM>reset 
Focus returned to COM port 


The following events occur when the reset command is executed: 
e The system restarts and the system console firmware reinitializes. 


e The console exits RCM command mode and reconnects the user’s terminal to 
the server’s COM] serial port. 


e The power-up messages are displayed, and then the console prompt is displayed 
or the operating system boot messages are displayed, depending on the state of 
the Halt button. 


C.1.3.13 setesc 


The setesc command allows the user to reset the default escape sequence for entering 
console mode. The escape sequence can be any character string. A typical sequence 
consists of 2 or more characters, to a maximum of 15 characters. The escape 
sequence is stored in the module’s on-board NVRAM. 


NOTE: If you change the escape sequence, be sure to record the new sequence. 
Although the module factory defaults can be restored if the user has forgotten the 
escape sequence, this involves accessing the server control module and moving a 
jumper. 


The following sample escape sequence consists of five iterations of the Ctrl key and 
the letter “‘o”. 

RCM>setesc 

“o*0%0%0%O 

RCM> 


C-12 AlphaServer 4000/4100 Service Manual 


If the escape sequence entered exceeds 15 characters, the command fails with the 
message: 


KkK* ERROR KkK*K 


When changing the default escape sequence, avoid using special characters that are 
used by the system’s terminal emulator or applications. 


Control characters are not echoed when entering the escape sequence. To verify the 
complete escape sequence, use the status command. 


C.1.3.14 setpass 


The setpass command allows the user to change the modem access password that is 
prompted for at the beginning of a modem session. The password is stored in the 
module’s on-board NVRAM. 


RCM>setpass 
new PASSO eA AA SACS 
RCM> 


The maximum password length is 15 characters. If the password entered exceeds 15 
characters, the command fails with the message: 


KkK* ERROR KKK 


The minimum password length is one character, followed by a carriage return. If 
only a carriage return is entered, the command fails with the message: 


**x* ERROR - illegal password *** 


If the user has forgotten the password, a new password can be entered. 
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C.13.15 status 


The status command displays the current state of the server’s sensors, as well as the 
current escape sequence and alarm information. 


RCM>status 


Firmware Rev: V1.0 

Escape Sequence: *]*]RCM 
Remote Access: ENABLE/DISABLE 
Alerts: ENABLE/DISABLE 

Alert Pending: YES/NO (C) 
Temp (C): 26.0 

RCM Power Control: ON/OFF 
External Power: ON 

Server Power: OFF 


RCM> 


The status fields are explained in Table C-2. 


Table C-2 RCM Status Command Fields 


Item Description 

Firmware Rev: | Revision of RCM firmware. 

Escape Sequence: Current escape sequence to enter RCM firmware 
console. 

Remote Access: Modem remote access state. (ENABLE/DISABLE) 

Alerts: Alert dial-out state. ENABLE/DISABLE) 

Alert Pending: Alert condition triggered. (YES/NO) 

Temp (C): Current system temperature in degrees Celsius. 

RCM Power Control: Current state of RCM system power control. 
(ON/OFF) 

External Power: Current state of power from external power supply 


to server control module. (ON/OFF) 


Server Power: Current state of system power. (ON/OFF) 
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C.1.4 Dial-Out Alerts 


The RCM can be configured to automatically dial out through the modem 
(usually to a paging service) when it detects a power failure within the system. 
When a dial-out alert is triggered, the RCM initializes the modem for dial-out, 
sends the dial-out string, hangs up the modem, and reconfigures the modem for 
dial-in. The RCM and modem must continue to be powered, and the phone line 
must remain active, for the dial-out alert function to operate. 


Example C-3 Configuring the Modem for Dial-Out Alerts 


POO>>> set rcem_dialout “ATDTstring#;” 1) 


RCM>enable 
RCM>status 
Remote Access: ENABLE e 
RCM>alert_ena C3] 


Example C-4 Typical RCM Dial-Out Command 


POO>>> set rem_dialout “ATXDT9, 15085553333,,,,,,5085553332#;” 


Use the show command to verify the RCM dial-out string: 


POO>>> show rcem_dialout 
rem_dialout ATXDT9, 15085553333,,,,,,5085553332#; 
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Enabling the Dial-Out Alert Function: 


1. Enter the set rem_dialout command, followed by a dial-out alert string, from 
the SRM console (see @ in Example C-3). 


The string is a modem dial-out character string, not to exceed 47 characters, that 
is used by the RCM when dialing out through the modem. See the next topic for 
details on composing the modem dial-out string. 


2. Enter the RCM firmware console and enter the enable command to enable 
remote access dial-in. The RCM firmware status command should display 
“Remote Access: ENABLE.” (See @.) 


3. Enter the RCM firmware alert_ena command to enable outgoing alerts. (See 


©.) 
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Composing a Modem Dial-Out String 


The modem dial-out string emulates a user dialing an automatic paging service. 
Typically, the user dials the pager phone number, waits for a tone, and then enters a 
series of numbers. 


The RCM dial-out string (Example C-4) has the following requirements: 


e The entire string following the set rem_dialout command must be enclosed by 
quotation marks. 


e The characters ATDT must be entered after the opening quotation marks. Do 
not mix case. Enter the characters either in all uppercase or all lowercase. 


e Enter the character X if the line to be used also carries voice mail. Refer to the 
example that follows. 


e The valid characters for the dial-out string are the characters on a phone keypad: 
0-9, *, and #. In addition, a comma (,) requests that the modem pause for 2 
seconds, and a semicolon (;) is required to terminate the string. 


Elements of the Dial-Out String 


ATXDT AT = Attention 
X = Forces the modem to dial “blindly” (not look for a dial tone). 
Enter this character if the dial-out line modifies its dial tone when 
used for services such as voice mail. 
D = Dial 
T = Tone (for touch-tone) 
, = Pause for 2 seconds. 


9, In the example, “9” gets an outside line. Enter the number for an 
outside line if your system requires it. 


15085553333 Dial the paging service. 
PEGE be Pause for 12 seconds for paging service to answer. 
5085553332# “Message,” usually a call-back number for the paging service. 


; Return to console command mode. Must be entered at end of 
string. 
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C.1.5 Resetting the RCM to Factory Defaults 


If the escape sequence has been forgotten, you can reset the controller to factory 
settings. 


Reset Procedure 


1. Power down the AlphaServer system and access the server control module, as 
follows: 


Expose the PCI bus card cage. Remove three Phillips head screws holding the 
cover in place and slide it off the drawer. If necessary, remove several PCI and 
EISA options from the bottom of the PCI card cage until you have enough space 
to access the server control module. 


2. Unplug the external power supply to the server control module. 


Locate the password and option reset jumper. The jumper number, which is 
etched on the board, depends on the revision of the server control module. 


NOTE: If the RCM section of the server control module does not have an 
orange relay, the jumper number is J6. If the RCM section of the 
server control module has an orange relay, the jumper number is J7. 


3. Move the jumper so that it is sitting on both pins. 


4. Replace any panels or covers as necessary so you can power up the system. Press 
the Halt button and then power up the system to the SRM console prompt. 
Powering up with the password and option reset jumper in place resets the escape 
sequence, password, and modem enable states to the factory default. 


5. When the console prompt is displayed, power down the system and move the 
password and option reset jumper back onto the single pin. 


6. Replace any PCI or EISA modules you removed and replace the PCI bus card 
cage cover. 


7. Power up the system to the SRM console prompt and type the default escape 
sequence to enter RCM command mode: 


“]*] RCM 


8. Configure the module as desired. You must reset the password and modem 
enable states in order to enable remote access. 
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C.1.6 Troubleshooting Guide 


Table C-3 lists a number of possible causes and suggested solutions for 


symptoms you might see. 


Table C-3 RCM Troubleshooting 


Symptom 


Possible Cause 


Suggested Solution 


The local terminal 
will not communi- 
cate with the system 
or the RCM. 


RCM will not 
answer when the 
modem is called. 


System and terminal baud rate 
set incorrectly. 


Cables not correctly installed. 


Modem cables may be 
incorrectly installed. 


RCM remote access is disabled. 


RCM does not have a valid 
password set. 


The local terminal is currently 
in the RCM console firmware. 


On power-up, the RCM defers 
initializing the modem for 30 
seconds to allow the modem to 
complete its internal 
diagnostics and initialization. 


Modem may have had power 
cycled since last being 
initialized or modem is not set 


up correctly. 


Set the system and 
terminal baud rates to 
9600 baud. 


Review external cable 
installation. 


Check modem phone 
lines and connections. 


Enable remote access. 


Set password and enable 
remote access. 


Issue a quit command 
on the local terminal. 


Wait 30 seconds after 
powering up the system 
and RCM before 
attempting to dial in. 


Enter enable command 
from RCM console. 


Operating the System Remotely C-19 


Table C-3 RCM Troubleshooting (Continued) 


Symptom 


Possible Cause 


Suggested Solution 


After the system and 


RCM are powered 
up, the COM port 

seems to hang and 
then starts working 


after a few seconds. 


RCM installation is 
complete, but 
system will not 
power up. 


New password 
escape sequence, 
and modem enable 
state are forgotten 


when system and 
RCM module are 


powered down. 
The remote user 


sees a “+++” string 
on the screen. 


The message 
“unknown 
command” is 
displayed when the 
user enters a 
carriage return by 
itself. 


This delay is normal behavior. 


RCM Power Control: is set to 
DISABLE. 


The password and option reset 
jumper is still installed. If the 
RC®M section of the server 
control module does not have an 
orange relay, the jumper 
number is J6. If it does have an 
orange relay, the number is J7. 


The modem is confirming 
whether the modem has really 
lost carrier. This occurs when 
the modem sees an idle time, 
followed by a “3,” followed by 
a carriage return, with no 
subsequent traffic. If the 
modem is still connected, it will 
remain so. 


The terminal or terminal 
emulator is including a linefeed 
character with the carriage 
return. 


Wait a few seconds for the 
COM port to start working. 


Enter RCM console and 
issue the poweron 
command. 


After resetting RCM to 
factory defaults, move the 
jumper so that it is sitting 
on only one pin. 


This is normal behavior. 


Change the terminal or 
terminal emulator setting so 
that “new line” is not 
selected. 
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Continued on next page 


Table C-3 RCM Troubleshooting (Continued) 


Symptom Possible Cause Suggested Solution 
Cannot enable The modem is not configured Modify the modem 
modem or modem correctly to work with the initialization and/or answer 
will not answer. RCM. string. 
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C.1.7 Modem Dialog Details 


This section provides further details on the dialog between the RCM and the 
modem and is intended to help you reprogram your modem if necessary. 


Phases of Modem Operation 


The RCM is programmed to expect specific responses from the modem during four 
phases of operation: 


e = Initialization 

e Ring detection 
e Answer 

e Hang-up 


The initialization and answer command strings are stored in the RCM NVRAM. The 
factory default strings are: 


Initialization string: AT&FOEVS0=0S12=50<cr> 


Answer string ATXA<cr> 


NOTE: All modem commands must be terminated with a <cr> character (OxOd 
hex). 

Initialization 

The RCM initializes the modem to the following configuration: 
Factory defaults (&F 0) 

No Echo (E) 

Numeric response codes (V) 

No Auto Answer (S0=0) 

Guard-band = | second (S12=50) 

Fixed modem-to-RCM baud rate 

Connect at highest possible reliability and speed 


The RCM expects to receive a “O<er>” (OK) in response to the initialization string. 
If it does not, the enable command will fail. 
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This default initialization string works on a wide variety of modems. If your modem 
does not configure itself to these parameters, the initialization string will need to be 

modified. See the topic in this section entitled Modifying Initialization and Answer 
Strings. 


Ring Detection 


The RCM expects to be informed of an in-bound call by the modem signaling the 
RCM with the string, “2<cr>” (RING). 


Answer 


When the RCM receives the ring message from the modem, it responds with the 
answer string. The “X” command modifier used in the default answer string forces 
the modem to report simple connect, rather than connect at xxxx. The RCM expects 
a simple connect message, “1<cr>” (CONNECTED). If the modem responds with 
anything else, the RCM forces a hang-up and initializes the modem. 


The default answer string is formatted to request the modem to provide only basic 
status. If your modem does not provide the basic response, the answer string, and/or 
initialization string will need to be modified. See the topic in this section entitled 
Modifying Initialization and Answer Strings. 


After receiving the “connect” status, the modem waits for 6 seconds and then 
prompts the user for a password. 


Hangup 


When the RCM is requested to hang up the modem, it forces the modem into 
command mode and issues the hangup command to the modem. This is done by 
pausing for a minimum of the guard time, sending the modem “+++”. When the 
modem responds with “O0<cr>” (OK), the hang-up command string is sent. The 
modem should respond with “3<cr>” (NO CARRIER). After this interchange, the 
modem is reinitialized in preparation for the next dial-in session. 
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RCM/Modem Interchange Overview 


Table C-4 summarizes the actions between the RCM and the modem from 
initialization to hangup. 


Table C-4 RCM/Modem Interchange Summary 


Action Data to Modem Data from Modem 
Initialization command AT&FOEVS0=0S12=50<cr> 

Initialization successful 0<cr> 

Phone line ringing 2<cr> 

RCM answering ATXA<cr> 

Modem successfully 1l<cr> 

connected 

Force modem into <guard_band>+++ 


command mode 


Modem in command O<cr> 
mode 

Hangup ATH<cr> 

Successful hangup 3<cr> 


Modifying Initialization and Answer Strings 


The initialization and answer strings are stored in the RCM’s NVRAM. They come 
pre-programmed to support a wide selection of modems. In the circumstance where 
the default initialization and answer strings do not set the modem into the desired 
mode, the following SRM set and show commands are provided to enable the user to 
define and examine the initialization and answer strings. 


To replace the initialization string: 


POO>>> set rem_init "new_init_string" 


To replace the answer string: 


POO>>> set rcm_answer "new_answer_string" 
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To display all the RCM user settable strings: 


POO>>> show rcm* 

rcem_answer ATXA 

rem_dialout 

rem_init AT&FOEVS0=0S12=50 

POO>>> 

Initialization and Answer String Substitutions 

The RCM default initialization and answer strings are as follows: 
Initialization String: “AT&FOEVS0=0S12=50” 

Answer String: “ATXA” 


The following modem requires a modified answer string. 


Initialization String Answer String 


USRobotics Sportster RCM default “ATXO0&B1&A0A” 
28,800 Data/Fax Modem 
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? 


? command, RCM, C-10 


4 
4000 system drawer, 1-4, 1-6 
4100 system drawer, 1-2 


A 


alert_clr command, RCM, C-8 
alert_dis command, RCM, C-8 
alert_ena command, RCM, C-8 
Alpha 21164 microprocessor, 1-16 
Alpha chip composition, 1-20 
AlphaBIOS console, 1-15 
loading, 2-7 
upgrading, A-28 
Architecture, system, 1-16 
auto_action environment variable, 
SRM, 2-23 
Auxiliary voltage (vaux), 4-9 


B3002-AA CPU module, 1-21 
B3002-AB CPU module, 1-21 
B3002-BA CPU module, 1-21 
B3004-AA CPU module, 1-21 
B3004-DA CPU module, 1-21 
B3020-CA memory module, 1-23, 7-3 
B3030-EA memory module, 1-23, 7-3 
B3030-FA memory module, 1-23, 7-3 
B3030-GA memory module, 1-23 
B3040-AA bridge module, 1-28, 7-3 
B3040-AB bridge module, 7-3 


index 


B3050-AA PCI motherboard, 1-30, 7- 
3 

B3051-AA PCI motherboard, 7-3 

BA30A system drawer, 1-2 

BA30B system drawer, 1-6 

BA30C system drawer, 1-4 

B-cache, 2-21, 2-23 

Bridge module (B3040-AA) 
removal and replacement, 7-26 

Bridge module (B3040-AB) 
removal and replacement, 7-28 

Bridge module LEDs, 3-3 


C 


Cabinet differences, 1-9 
Cabinet fan tray 
fan removal and replacement, 7- 
66 
fan tray fan fail detect module 
removal and 
replacement, 7-68 
power supply removal and 
replacement, 7-64 
removal and replacement, 7-62 
Cabinet system, 1-8 
power and fan LEDs, 3-4 
power supply for remote access, 
3-5 
Cables and jumpers, system drawer, 
7-5, 7-6 
Cables, pedestal, 7-7 
CAP Error Register, 6-11 
CAP Error Register Data Pattern, 5- 
46 
CAP_ERR Register, 6-11 


Index-1 


CD-ROM 
removal and replacement, 7-60 
COM1 port, 2-19 
Command codes, 5-54 
Command summary (SRM), B-2 
Components 
housed in system drawer, 1-2, 1- 
4, 1-6 
Console 
SRM, 2-23 
Console device determination, 2-18 
Console device options, 2-19 
Console device, changing, 2-19 
console environment variable, SRM, 
2-21, 2-23 
Console power-up tests, 2-16 
Control panel, 1-12, 2-2 
display, 2-21 
Halt button, 1-13 
LCD potentiometer, 2-2 
messages in display, 2-3 
Controls 
Halt button, 1-13 
Cover interlocks, 1-3, 1-5, 1-7, 4-7 
overriding, 4-7 
removal and replacement, 7-50, 
7-52 
CPU and bridge module LEDs, 3-2 
CPU LEDs, 3-3 
CPU module, 1-20 
configuration rules, 1-21 
removal and replacement, 7-18 
variants, 1-21 
CPU modules, 1-17, 7-3 


D 


DECevent, 5-6 

report formats, 5-10 
DIAGNOSE command, 5-7 
Diagnostics, test command, 3-12 
disable command, RCM, C-9 
display command (LFU), A-24, A-25 
Double error halt, 5-57 
Drives, CD-ROM and floppy, 1-12 


Index-2 


ECC syndrome bits, 5-53 
ECU, running, A-4 
EL_ADDR Register, 6-6 
EL_STAT Register, 6-2 
enable command, RCM, C-9 
Environment variables 
SRM console, B-4 
Environment variables, SRM, 1-15 
auto_action, 2-23 
console, 2-21, 2-23 
os_type, 2-23 
Error detector placement, 5-2 
Error log events, 5-5 
Error registers, 6-1 
Event files, translating, 5-7 
Events, filtering, 5-8 
exit command (LFU), A-13, A-19, A- 
23, A-24, A-25 
External Interface Address Register, 
6-6 
External Interface Registers 
loading and locking rules, 6-7 
External Interface Status Register, 6-2 


F 

Fail-safe loader, 2-24 

Fan removal and replacement, 7-48 

Fan tray cables (cabinet), 7-4 

Fan tray, cabinet system, 1-9 

Fan tray, LEDs, 3-5 

Fans, 7-3 

Fans, top of cabinet, 3-5 

Fatal errors, 5-5 

FEPROM 
and XSROM test flow, 2-13 
defined, 2-5 

Firmware 
RCM, C-6 
updating, A-8 
updating from AlphaBIOS, A-27 
updating from CD-ROM, A-9 
updating from floppy disk, A-14, 

A-16 


updating from network device, A- 
20 
updating, AlphaBIOS selection, 
A-6 
updating, SRM command, A-6 
Floppy 
removal and replacement, 7-58 
FRU list, 7-2 
4000 power system, 7-10 
4100 power system, 7-8 
FRU part numbers, 7-3 


G 


Graphics monitor, VGA, 2-19 


H 


H7600-AA power controller, 1-9 
H7600-DB power controller, 1-9 
halt command, RCM, C-10 
Halts 

caused by power problem, 3-6 
hangup command, RCM, C-9 
Hard errors, categories of, 5-4 
help command (LFU), A-24, A-25 
help command, RCM, C-10 


I squared C bus, 3-10 
INFO 3 command, 5-58 
INFO 5 command, 5-60 
INFO 8 command, 5-62 
Initialization and answer strings 
modifying for modem, C-24 
substitutions, C-25 
Interlock switches, 7-50, 7-52 
IOD, 2-23 
IOD detected failure 
PCI error, 5-32 
System bus error, 5-27 
IOD error interrupts, 5-5 
IOD, defined, 5-2 


L 
LCD, 2-2 
LEDs 
troubleshooting with, 3-2 
LEDs, fan and power in cabinet, 3-5 
LFU 
exit command, A-25 
starting, A-6, A-8 
starting the utility, A-6 
typical update procedure, A-8 
update command, A-26 
updating firmware from CD- 
ROM, A-9 
updating firmware from floppy 
disk, A-14, A-16 
updating firmware from network 
device, A-20 
lfu command (LFU), A-17, A-19, A- 
24, A-25 
LFU commands 
display, A-24, A-25 
exit, A-13, A-19, A-23, A-24, A- 
25 
help, A-24, A-25 
Ifu, A-17, A-19, A-24, A-25 
list, A-11, A-17, A-19, A-21, A- 
23, A-24, A-26 
readme, A-24, A-26 
summary, A-24 
update, A-13, A-24, A-26 
verify, A-24, A-26 
list command (LFU), A-11, A-17, A- 
21, A-24, A-26 
Loadable Firmware Update utility. 
See LFU 


M 


Machine checks in PAL mode, 5-57 
Maintenance bus, 3-10 

Maintenance bus controller, 3-10 

MC Error Information Register 0, 6-8 
MC Error Information Register 1, 6-9 
MC_ERRO Register, 6-8 

MC_ERRI Register, 6-9 
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MCHK 620 correctable error, 5-44 


41 

MCHK 660 IOD detected failure, 5- 
27, 5-32 

MCHK 670 CPU and IOD detected 
failure, 5-16 

MCHK 670 CPU-detected failure, 5- 
11 

MCHK 670 read dirty failure, 5-21 
Memory addressing, 1-24 

rules, 1-25 

Memory errors 

corrected read data error, 5-52 
read data substitute error, 5-52 
Memory modules, 1-17, 1-22, 7-3 
removal and replacement, 7-22 
variants, 1-23 

Memory operation, 1-23 

Memory option 

configuration rules, 1-23 
Memory pairs, 1-23 

Memory tests, 2-14, 2-21 

Memory, broken, 5-52 

Modem, C-2 

answer, C-23 

dial-in procedure, C-4 

hangup, C-23 

phases of operation, C-22 

ring detection, C-23 


Node IDs, 5-55 
NVRAM, 2-3, 2-8, 7-34 


O 


Operator control panel removal and 
replacement 
cabinet system, 7-54 
pedestal system, 7-56 
os_type environment variable, SRM, 
2-7, 2-23 
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MCHK 630 correctable CPU error, 5- 


P 


Page table entry invalid error, 5-51 
PALcode, 2-23 
PALcode, described, 5-56 
PCI Error Status Register 1, 6-14 
PCI I/O subsystem, 1-30 
PCI master abort, 5-51 
PCI motherboard, 1-31 
PCI motherboard (B3050) 
removal and replacment, 7-34 
PCI motherboard (B3051) 
removal and replacment, 7-36 
PCI parity error, 5-51 
PCI system error, 5-51 
PCI/EISA option 
removal and replacement, 7-40 
PCI_ERR Register, 6-14 
Pedestal system, 1-10 
PIO buffer overflow error 
(PIO_OVFL), 5-50 
Potentiometer, 2-2 
Power circuit 
and cover interlocks, 4-6 
diagram, 4-6 
failures, 4-7 
Power configuration rules 
cabinet system, 4-10 
pedestal system, 4-14, 4-15 
redundancy, 1-37 
Power control module, 1-17, 1-34 
LED states, 3-9 
removal and replacement, 7-24 
Power control module features, 4-4 
Power control module LEDs, 3-8 
Power cords, internal, 7-4 
Power faults, 4-9 
Power harness 
removal and replacement, 7-44, 
7-46 
Power problems 
at power-up, 3-7 
Power supply, 1-36 
fault protection, 4-3 
outputs, 4-2 
removal and replacement, 7-42 


voltages, 4-3 
Power system components, 7-4 
poweroff command, RCM, C-10 
poweron command, RCM, C-11 
Power-up 

SROM and XSROM messages 

during, 2-19 

Power-up display, 2-20 
Power-up sequence, 2-4 
Power-up/down sequence, 4-8 
Processor 

determining primary, 2-21 
Processor correctable error, 5-5 
Processor machine checks, 5-5 


Q 


quit command, RCM, C-11 


R 


RAID Standalone Configuration 
Utility, running, A-5 

RCM, C-1 

command summary, C-6 

dial-out alerts, C-15 

entering and leaving command 

mode, C-5 

modem usage, C-2 

resetting to factory defaults, C-18 

troubleshooting, C-19 

typical dialout command, C-15 
RCM commands 

?, C-10 

alert_clr, C-8 

alert_dis, C-8 

alert_ena, C-8 

disable, C-9 

enable, C-9 

halt, C-10 

hangup, C-9 

help, C-10 

poweroff, C-10 

poweron, C-11 

quit, C-11 

reset, C-12 


setesc, C-12 

setpass, C-13 

status, C-14 
rcm_dialout command, C-15 
readme command (LFU), A-24, A-26 
Redundant power, 1-37 
Registers, 6-1 
Remote console monitor. See RCM 
Remote console monitor module, 1-32 
reset command, RCM, C-12 


S 


Safety guidelines, 7-1 
Serial number, system, 7-30, 7-32 
restoring with set 
sys_serial_num, 7-31, 7- 
33 
Serial ports, 1-31 
Serial terminal, 2-19 
Server control module, 1-32 
removal and replacment, 7-38 
Server control module power, 7-5 
set sys_serial_num command, 7-31, 
7-33 
setesc command, RCM, C-12 
setpass command, RCM, C-13 
show power command (SRM), 1-37 
Soft errors, categories of, 5-4 
SRM commands 
show power, 1-37 
SRM console, 1-15, 2-23 
SROM, 2-21 
defined, 2-4 
errors, 2-11 
power-up test flow, 2-8 
tests, 2-10 
Standard I/O, 1-32 
status command, RCM, C-14 
StorageWorks shelf removal and 
replacement, 7-70 
sys_model_number environment 
variable, 7-34 
sys_type environment variable, 7-34 
System bus, 1-17, 1-26 
System bus address parity error, 5-49 


Index-5 


System bus ECC error, 5-47 
System bus nonexistent address error, 
5-48 
System bus to PCI bus bridge module, 
1-17, 1-28 
System bus to PCI/EISA bus bridge 
module, 1-17 
System consoles, 1-14 
System correctable errors, 5-5 
System drawer 
components of, 1-2, 1-4, 1-6 
FRU locations, 7-2 
fully configured, 1-17 
remote operation, C-1 
System drawer exposure 
original cabinet, 7-12 
pedestal, 7-16 
System drawer modules, 7-3 
System machine checks, 5-5 
System model number, displaying, 7- 
34 
System motherboard, 1-18 
System motherboard (4000) 
removal and replacement, 7-32 
System motherboard (4100 & early 
4000) 
removal and replacement, 7-30 


T 


Test command 
for entire system, 3-13 
Test mem command, 3-15 
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Test pci command, 3-17 
Troubleshooting 
failures at power-up, 3-7 
IOD detected errors, 5-46 
power problems, 3-6 
using error logs, 5-2 


U 


update command (LFU), A-13, A-19, 
A-23, A-24, A-26 
Updating firmware 
AlphaBIOS console, A-27 
from AlphaBIOS console, A-6 
from SRM console, A-6 
Utility programs 
running from graphics monitor, 
A-2 
running from serial terminal, A-3 


V 


verify command (LFU), A-24, A-26 


xX 
XBUS, 1-31 
XSROM 
defined, 2-4 
errors, 2-15 
power-up test flow, 2-12 
tests, 2-13 


